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Abstract 

SAFE is a clean-slate design for a highly secure computer system, with pervasive mechanisms for track¬ 
ing and limiting information flows. At the lowest level, the SAFE hardware supports fine-grained pro¬ 
grammable tags, with efficient and flexible propagation and combination of tags as instructions are ex¬ 
ecuted. The operating system virtualizes these generic facilities to present an information-flow abstract 
machine that allows user programs to label sensitive data with rich confidentiality policies. We present a 
formal, machine-checked model of the key hardware and software mechanisms used to dynamically control 
information flow in SAFE and an end-to-end proof of noninterference for this model. 

We use a refinement proof methodology to propagate the noninterference property of the abstract ma¬ 
chine down to the concrete machine level. We use an intermediate layer in the refinement chain that fac¬ 
tors out the details of the information-flow control policy and devise a code generator for compiling such 
information-flow policies into low-level monitor code. Finally, we verify the correctness of this genera¬ 
tor using a dedicated Hoare logic that abstracts from low-level machine instructions into a reusable set of 
verified structured code generators. 
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1 Introduction 


The SAFE design is motivated by the conviction that the insecurity of present-day computer systems is due 
in large part to legacy design decisions left over from an era of scarce hardware resources. The time is ripe 
for a complete rethink of the entire system stack with security as the central focus. In particular, designers 
should be willing to spend more of the abundant processing power available on today’s chips to improve 
security. 

A key feature of SAFE is that every piece of data, down to the word level, is annotated with a tag 
representing policies that govern its use. While the tagging mechanism is very general [9, 35], one partic¬ 
ularly interesting use of tags is for representing information-flow control (IFC) policies. For example, an 
individual record might be tagged “This information should only be seen by principals Alice or Bob,” a 
function pointer might be tagged “This code is trusted to work with Carol’s secrets,” or a string might be 
tagged “This came from the network and has not been sanitized yet.” Such tags representing IFC policies 
can involve arbitrary sets of principals, and principals themselves can be dynamically allocated to represent 
an unbounded number of entities within and outside the system. 

At the programming-language level, rich IFC policies have been extensively explored, with many pro¬ 
posed designs for static [43, 67, 68, 73, 77, 96] and dynamic [4, 5, 6, 7, 40, 44, 72, 75, 78, 86] enforcement 
mechanisms and a huge literature on their formal properties [43, 77, etc.]. Similarly, operating systems with 
information-flow tracking have been a staple of the OS literature for over a decade [36, 54, 55, 66, 97, 97]. 
But progress at the hardware level has been more limited, with most proposals concentrating on hardware 
acceleration for taint-tracking schemes [18, 25, 26, 31, 32, 89, 92]. SAFE extends the state of the art in 
two significant ways. First, the SAFE machine offers hardware support for sound and efficient purely- 
dynamic tracking of both explicit and implicit flows (i.e., information leaks through both data and control 
flow) for arbitrary machine code programs—not just programs accepted by static analysis, or produced 
by translation or transformation. Moreover, rather than using just a few “taint bits,” SAFE associates a 
word-sized tag to every word of data in the machine—both memory and registers. In particular, SAFE tags 
can be pointers to arbitrary data structures in memory. The interpretation of these tags is left entirely to 
software: the hardware just propagates tags from operands to results as each instruction is executed, fol¬ 
lowing software-defined rules. Second, the SAFE design has been informed from the start by an intensive 
effort to formalize critical properties of its key mechanisms and produce machine-checked proofs, in par¬ 
allel with the design and implementation of its hardware and system software. Though some prior work 
(surveyed in Section 12) shares some of these aims, to the best of our knowledge no project has attempted 
this combination of innovations. 

Abstractly, the tag propagation rules in SAFE can be viewed as a partial function from argument tuples 
of the form (opcode, pc tag, argumenti tag, argument^ tag, ...) to result tuples of the form (new pc tag, 
result tag), meaning “if the next instruction to be executed is opcode, the current tag of the program counter 
(PC) is pc tag, and the arguments expected by this opcode are tagged argumenti tag, etc., then executing 
the instruction is allowed and, in the new state of the machine, the PC should be tagged new pc tag and 
any new data created by the instruction should be tagged result tag” (The individual argument-result pairs 
in this function’s graph are called rule instances, to distinguish them from the symbolic rules used at the 
software level.) In general, the graph of this function in extenso will be huge; so, concretely, the hardware 
maintains a cache of recently-used rule instances. On each instruction dispatch (in parallel with the logic 
implementing the usual behavior of the instruction—e.g., addition), the hardware forms an argument tuple 
as described above and looks it up in the rule cache. If the lookup is successful, the result tuple includes a 
new tag for the PC and a tag for the result of the instruction (if any); these are combined with the ordinary 
results of instruction execution to yield the next machine state. Otherwise, if the lookup is unsuccessful, 
the hardware invokes a cache fault handler —a trusted piece of system software with the job of checking 
whether the faulting combination of tags corresponds to a policy violation or whether it should be allowed. 
In the latter case, an appropriate rule instance specifying tags for the instruction’s results is added to the 
cache, and the faulting instruction is restarted. Thus, the hardware is generic and the interpretation of 
policies (e.g., IFC, memory safety or control flow integrity [9, 35]) is programmed in software, with the 
results cached in hardware for common-case efficiency. 

The first contribution of this paper is to explain and formalize, in the Coq proof assistant [90], the key 
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ideas in this design via a simplified model of the SAFE machine, embodying its tagging mechanisms in 
a distilled form and focusing on enforcing IFC using these general mechanisms. In Section 2, we outline 
the features of the full SAFE system and enumerate the most significant simplifications in our model. In 
Section 3, we present the high-level programming interface of our model, embodied by an abstract IFC 
machine with a built-in, purely dynamic IFC enforcement mechanism and an abstract lattice of IFC labels. 
We then show, in three steps, how this abstract machine can be implemented using the low-level mechanisms 
we propose. The first step introduces a symbolic IFC rule machine that reorganizes the semantics of the 
abstract machine, splitting out the IFC enforcement mechanism into a separate judgment parameterized by 
a symbolic IFC rule table (Section 4). The second step defines a generic concrete machine (Section 5) that 
provides low-level support for efficiently implementing many different high-level policies (IFC and others) 
with a combination of a hardware rule cache and a software fault handler. The final step instantiates 
the concrete machine with a concrete fault handler enforcing IFC. We do this using an IFC fault handler 
generator (Section 6), which compiles the symbolic IFC rule table into a sequence of machine instructions 
implementing the IFC enforcement judgment. 

Our second contribution is a machine-checked proof that this simplified SAFE system is correct and 
secure, in the sense that user code running on the concrete machine equipped with the IFC fault handler 
behaves the same way as on the abstract machine and enjoys the standard noninterference property that 
“high inputs do not influence low outputs.” The interplay of the concrete machine and fault handler is 
complex, so some proof abstraction is essential. (Previous projects such as the CompCert compiler [57], 
the seF4 [53, 66] and CertiKOS [39, 82] microkernels, and the RockSalt SFI checker [64] have demon¬ 
strated the need for significant attention to organization in similar proofs.) In our proof architecture, a first 
abstraction layer is based on refinement. This allows us to reason in terms of a high-level view of mem¬ 
ory, ignoring the concrete implementation of IFC labels, while setting up the intricate indistinguishability 
relation used in the noninterference proof. A second layer of abstraction is required for reasoning about the 
correctness of the fault handler. Here, we rely on a verified custom Hoare logic that abstracts from low-Ievel 
machine instructions into a reusable set of verified structured code generators. 

In Section 7 we prove that the IFC fault handler generator correctly compiles a symbolic IFC rule 
table and a concrete representation of an abstract label lattice into an appropriate sequence of machine 
instructions. We then introduce a standard notion of refinement (Section 8) and show that the concrete 
machine running the generated IFC fault handler refines the abstract IFC machine and vice-versa, using the 
symbolic IFC rule machine as an intermediate refinement point in each direction of the proof (Section 9). 
In our deterministic setting, showing refinement in both directions guarantees that the concrete machine 
does not diverge or get stuck when handling a fault. We next introduce a standard termination-insensitive 
noninterference (TINI) property (Section 10) and show that it holds for the abstract machine. Since deter¬ 
ministic TINI is preserved by refinement, we conclude that the concrete machine running the generated IFC 
fault handler also satisfies TINI. In Section 11, we explain how the programming model and formal de¬ 
velopment of the first sections can be extended to accommodate two important features: dynamic memory 
allocation and tags representing sets of principals. This extension, carried out after the development of the 
basic model, gives us confidence in the robustness of our methodology. We close with a survey of related 
work (Section 12) and a discussion of future directions (Section 13). Our Coq formalization is available at 
https://github.com/micro-policies/verified-ifc. 

A preliminary abridged version of this work appeared in the proceedings of the POPE 2014 confer¬ 
ence [8]. This extended and improved version includes: 

• more examples and clarifying explanations in the formal sections; 

• a more detailed technical description of the formalization: the semantics of the abstract, symbolic 
and concrete machines, the language for expressing symbolic IFC rules, our verified structured code 
generators, and TINI-preserving refinements; 

• more details of the proofs; 

• a more extensive discussion of related work, including more recent work on transplanting the tagging 
mechanism of SAFE onto a mainstream RISC processor [30] and using it to enforce properties beyond 
IFC [9, 35]. 


5 



2 Overview of SAFE 


To establish context, we begin with a brief overview of the full SAFE system, concentrating on its OS- and 
hardware-level features. More detailed descriptions can be found elsewhere [29, 33, 34, 35, 45, 46, 56, 62]. 
safe’s system software performs process scheduling, stream-based interprocess communication, storage 
allocation and garbage collection, and management of the low-level tagging hardware (the focus of this 
paper). The goal is to organize these services as a collection of mutually suspicious compartments following 
the principle of least privilege (a zero-kernel OS [84]), so that an attacker would need to compromise 
multiple compartments to gain complete control of the machine. It is programmed in a combination of 
assembly and Tempest, a new low-level systems programming language. 

The SAFE hardware integrates a number of mechanisms for eliminating common vulnerabilities and 
supporting higher-level security primitives. To begin with, SAFE is (dynamically) typed at the hardware 
level; each data word is indelibly marked as a number, an instruction, a pointer, etc. Next, the hardware 
is memory safe: every pointer consists of a triple of base, bounds, and offset (compactly encoded into 64 
bits [34, 56]), and every pointer operation includes a hardware bounds check [56]. Finally, the hardware 
associates each word in the registers and memory, as well as the PC, with a large (59-bit) tag. The hardware 
rule cache, enabling software-specified propagation of tags from operands to result on each machine step, is 
implemented using a combination of multiple hash functions to approximate a fully-associative cache [33]. 

An unusual feature of the SAFE design is that formal modeling and verification of its core mechanisms 
have played a central role in the design process since the beginning. The original goal—formally specify¬ 
ing and verifying the entire set of critical runtime services—proved to be too ambitious, but key security 
properties of simplified models have been verified both at the level of Breeze [45] (a mostly functional, 
security-oriented, dynamic language used for user-level programming on SAFE) and, in the present work, 
at the hardware and abstract machine level. We also used random testing of properties like noninterference 
as a means to speed the design process [46]. 

Our goal in this paper is to develop a clear, precise, and mathematically tractable model of one of the 
main innovations in the SAFE design: its scheme for efficiently supporting high-level data use policies 
using a combination of hardware and low-level system software. To make the model easy to work with, 
we simplify away many important facets of the real SAFE system. In particular, (i) we focus only on IFC 
and noninterference, although the tagging facilities of the SAFE machine are generic and can be applied 
to other policies (more recent work illustrates this point [8, 35]; we return to it at the end of Section 12); 
(ii) we ignore the Breeze and Tempest programming languages and concentrate on the hardware and run¬ 
time services; (iii) we use a stack instead of registers, and we distill the instruction set to just a handful of 
opcodes; (iv) we drop SAFE’S fine-grained privilege separation in favor of a more conventional user-mode 
/ kernel-mode dichotomy; (v) we shrink the rule cache to a single entry (avoiding issues of replacement and 
eviction) and maintain it in kernel memory, accessed by ordinary loads and stores, rather than in specialized 
cache hardware; (vi) we focus on termination-insensitive noninterference and omit a large number of more 
advanced IFC-related concepts that are supported by the real SAFE system (dynamic principals, down¬ 
grading, public labels, integrity, clearance, etc.); (vii) we handle exceptional conditions, including potential 
security violations, by simply halting the whole machine; and (viii) most importantly, we ignore concur¬ 
rency, process scheduling, and interprocess communication, assuming instead that the whole machine has a 
single, deterministic thread of control. We believe that most of these restrictions can be lifted without funda¬ 
mentally changing the structure of the model or of the proofs. For instance, recent follow-on work by some 
of the authors [47] discusses a mechanized proof of noninterference for a similar abstract machine featuring 
registers and a richer IFC policy. The absence of concurrency is a particularly significant simplification, 
given that we are talking about an operating system that offers IFC as a service. However, we conjecture 
that it may be possible to add concurrency to our formalization, while maintaining a high degree of deter¬ 
minism, by adapting the approach used in the proof of noninterference for the seF4 microkernel [65, 66]. 
We return to this point in Section 13. 
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instr 


Basic instruction set 


Add 

addition 

Output 

output top of stack 

Push n 

push integer constant 

Load 

indirect load from data memory 

Store 

indirect store to data memory 

Jump 

unconditional indirect jump 

Bnz n 

conditional relative jump 

Call 

indirect call 

Ret 

return 


Figure 1; Instruction set 

3 Abstract IFC Machine 

We begin the technical development by defining a very simple stack-and-pointer machine with “hard-wired” 
dynamic IFC. This machine concisely embodies the IFC mechanism we want to provide to higher-level 
software and serves as a specification for the symbolic IFC rule machine (Section 4) and for the concrete 
machine (Section 5) running our IFC fault handler (Section 6). The three machines share a tiny instruction 
set (Figure 1) designed to be a convenient target for compiling the symbolic IFC rule table into machine 
instructions (the Coq development formalizes several other instructions, including Sub, Pop, a variant of 
Call that takes a variable number of arguments and a variant of Ret that allows returning a result on the 
stack). All three machines use a fixed instruction memory l, a partial function from (non-negative) integer 
addresses to instructions. 

The machine manipulates integers (ranged over by n, m, and p); unlike the real SAFE machine, we 
make no distinction between raw integers and pointers (we re-introduce this distinction in Section 11). 
Each integer is marked with an individual lEC label (ranged over by L) that denotes its security level. We 
call a pair of an integer n and its corresponding label L an atom, written n®L and ranged over by a. We 
assume that lEC labels L form a set C equipped with a partial order (<), a least upper bound operation 
(V), and a bottom element (_L), but do not place further requirements on them. This generality allows us to 
model many different kinds of labels present in existing lEC systems [62]. Eor instance we might take C to 
be the set of levels {_L, T} with _L < T and _L V T = T. Alternatively, we could consider a richer set of 
labels, such as finite sets of principals ordered by set inclusion, as discussed in Section 11. 

An abstract machine state (p. [cr] pc) consists of a data memory p, a stack a, and a program counter pc. 
(We sometimes drop the outer brackets.) The data memory p is a partial function from integer addresses to 
atoms. We write p(p) ^ a for the memory that coincides with p everywhere except at p, where its value 
is a. The stack a is essentially a list of atoms, but we distinguish stacks beginning with return addresses 
(written pc; a) from ones beginning with regular atoms (written o, cr). Eormally, stacks are lists with two 
“cons” constructors, written and This distinction is needed so that stack-manipulating instructions 
treat frame markers specially; for example, a program that Pushes an integer and then attempts to return 
to it is treated as erroneous by the operational semantics. The program counter (PC) pc is an atom whose 
label is used to track implicit flows, as explained below. 

The step relation of the abstract machine, written r F pi \ai\ pci —p 2 [ 172 ] pc 2 , is a partial function 
taking a machine state to a machine state plus an output action a, which can be either an atom or the silent 
action t. We generally omit the instruction memory t from transitions because it is fixed. Throughout the 
paper we consistently refer to non-silent actions as events (ranged over by e). 

The stepping rules in Eigure 2 adapt a standard purely dynamic lEC enforcement mechanism [4, 75] to 
a low-level machine, following recent work by Hrijcu et al. [46]. (Readers less familiar with the intricacies 
of dynamic lEC may find some of these side conditions a bit mysterious. A longer explanation can be found 
in [46], but the details are not critical for present purposes.) The rule for Add joins (V) the labels of the 
two operands to produce the label of the result, which ensures that the result is at least as classified as each 
of the operands. Eor example, suppose l = [..., Add, ...] and n is the index of this Add instruction. Then 
pL [7@_L, 5@T] n@_L ^ p [12@T] (n-|-l)@_L. The rule for Push labels the integer constant added to the 
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i{n) = Add 


i,{n) = Push m 

/i [a] n@Lpc ^ /i a] {n+l)@Lpc 


/i [ni@Li, n2@L2, a] n@Lpc -4 

^ [{ni+n2)@iLi\/L 2 ), a] {n+l)@Lpc 


i{n) = Store ^{p) = k@L^ 

Li\/Lpc < L 3 p,{p) ^ (?7i@LiVL2Vipc) = pi! 

IJ. [p@Li,m@!L2,<7] n@Lpc -4 

pi' [ ct ] {n+l)@Lpc 


L{n) = Load pL{p) = m@L2 

pi [p@Li,a\ n@Lpc -4 

/i [m@{Li\/L 2 ), a] {n+l)@Lpc 


L{n) = Jump 

fx [n'@Li,a\ n@Lpc -4 pi [a] n'@{Li\/Lpc) 


L{n) = Bnz k n' = n+{{m = 0)?1 : k) 
fx [m@Li,a] n®Lpc -4 pi [a] n'@{Li\/Lpc) 


L{n) — Call i,{n) = Ret 

pi [n'®Li, a,a\ n@Lpc ^ pi [n'@Li;a] n@Lpc ^ pi [cr] n'@Li 

pi [a,{n+l)@Lpc;a] n'@{Li\/Lpc) 


L{n) = Output 

m@(LiVLpc) 

pi [m@Li,a\ n@Lpc ->■ 

pi [a] {n+l)®Lpc 


Figure 2: Semantics of abstract IFC machine 


stack as public (_L). The rule for Jump uses join to raise the label of the PC by the label of the target address 
of the jump. Similarly, Bnz raises the label of the PC by the label of the tested integer. In both cases the 
value of the PC after the instruction depends on data that could be secret, and we use the label of the PC to 
track the label of data that has influenced control flow. In order to prevent implicit flows (leaks exploiting 
the control flow of the program), the Store rule joins the PC label with the original label of the written 
integer and with the label of the pointer through which the write happens. Additionally, since the labels 
of memory locations are allowed to vary during execution, we prevent leaking information via labels using 
a “no-sensitive-upgrade” check [4, 96] (the < precondition in the rule for Store).' This check prevents 
memory locations labeled public from being overwritten when either the PC or the pointer through which 
the store happens has been influenced by secrets. The Output rule labels the emitted integer with the join 
of its original label and the current PC label.^ Finally, because of the structured control flow imposed by 
the stack discipline, the rule for Ret can soundly restore the PC label to whatever it was at the time of the 
Call. This feature allows programmers to avoid label creep —i.e., having the current PC label inadvertently 
go up when branching on secrets unknowingly—by making judicious use of Call and Ret, but may require 
careful thought to be used correctly. Many other solutions have been proposed to this problem, each with 
their own strengths and weaknesses. Some systems, such as LIO [87], prevent label creep by maintaining a 
clearance level that serves as an upper bound on the PC label; this, however, may lead to dynamic errors if 
a computation tries to inspect a secret above its clearance. 

All data in the machine’s state are labelled, and this simple machine manages labels to ensure nonin- 

* More recent work further improves precision compared to the no-sensitive-upgrades policy [5, 15, 44, 46]. We adopted no- 
sensitive-upgrades in this work because it is simpler and requires less bookkeeping. 

^We assume the observer of the events generated by Output is constrained by the rules of information flow—i.e., cannot freely 
“look inside” bare events. In the real SAFE machine, atoms being sent to the outside world need to be protected cryptographically; 
we are abstracting this away. 
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terference as defined and proved in Section 10. There are no instructions that dynamically raise the label 
(classification) of an atom. Such an instruction, joinP, is added to the machine in Section 11. 


4 Symbolic IFC Rule Machine 

In the abstract machine described above, IFC is tightly integrated into the step relation in the form of side 
conditions on each instruction. In contrast, the concrete machine (i.e., the “hardware”) described in Sec¬ 
tion 5 is generic, designed to support a wide range of software-defined policies (IFC and other). The 
machine introduced in this section serves as a bridge between these two models. It is closer to the abstract 
machine—indeed, its machine states and the behavior of the step relation are identical. The important 
difference lies in the definition of the step relation, where all the IFC-related aspects are factored out into 
a separate judgment. We can think of the IFC mechanism as being implemented in a separate “IFC rule 
processor” distinct from the main “CPU.” In the concrete machine, the CPU part will remain unchanged, 
but the IFC rule processor will be implemented mostly in software (by the fault handler), with the hardware 
only providing caching of rule instances. While factoring out IFC enforcement into a separate reference 
monitor [80] is commonplace [1, 75, 78], our approach goes further. We define a small DSL for describing 
symbolic IFC rules and obtain actual monitors by interpreting this DSL (in this section) and by compiling 
it into machine instructions using verified structured code generators (in Section 6 and Section 7). This 
architecture makes it easier to implement other IFC mechanisms (e.g., permissive upgrades [5]), beyond 
the simple one in Section 3. Since the DSL compilation is verified, we prove that the concrete machine 
of Section 5 is noninterfering when given any correct monitor written in the DSL. Showing that a moni¬ 
tor is correct, on the other hand, involves a simple refinement proof (Lemma 9.2), and a noninterference 
proof for the abstract machine (Theorem 10.5), but is independent of the code generation infrastructure and 
corresponding proofs. 

More formally, each stepping rule of the new machine (see Figure 3) includes a uniform call to an IFC 
enforcement relation, which itself is parameterized by a symbolic IFC rule table IZ. Given the labels of 
the values relevant to an instruction, the IFC enforcement relation (i) checks whether the execution of that 
instruction is allowed in the current configuration, and (ii) if so, yields the labels to put on the resulting 
PC and on any resulting value. This judgment has the form (Lpc, > ^ 2 , ^ 3 ) Lrpc, L^, where the 
4-tuple on the left-hand side represents the input PC label and three additional input labels (more precisely, 
optional labels, as the number of relevant labels depends on the opcode but the tuple is of fixed size), op is 
an opcode, and Lrpc and Lr are the resulting output labels (of which the second might be ignored). 

Let us illustrate, for a few cases, how this new judgment is used in the stepping relation (Figure 3). The 
stepping rule for Add passes three inputs to the IFC enforcement judgment: Lpc, the label of the current 
PC, and Lx and L 2 , the labels of the two operands at the top of the stack. (The fourth element of the input 
tuple is written as _ because it is not needed for Add.) The IFC enforcement judgment produces two labels: 
Lrpc is used to label the next program counter (n -I- 1) and Lr is used to label the result value. All the other 
stepping rules follow a similar scheme. (The one for Store uses all four input labels. In this stepping rule 
the resulting label Lr is used to label the new value m to be stored at location p.) 

A symbolic IFC rule table TZ describes a particular IFC enforcement mechanism. For instance, the rule 
table TZ^^^ corresponding to the IFC mechanism of the abstract machine is shown in Figure 4. In general, 
a table TZ associates a symbolic IFC rule to each instruction opcode (formally, 7^ is a total function). Each 
of these rules is formed of three symbolic expressions: (i) a boolean expression indicating whether the 
execution of the instruction is allowed or not (i.e., whether it violates the IFC enforcement mechanism); 
(ii) a label-valued expression for Lrpc, the label of the next PC; and (iii) a label-valued expression for Lr, 
the label of the result value, if there is one. In cases where Lr is not used by the corresponding opcode, we 
write_to mean “don’t care,” which is a synonym for BOT (the symbolic representation of the _L label). 

These symbolic expressions are written in a simple domain-specific language (DSL) of operations over 
an IFC lattice. The grammar of this DSL (Figure 5) includes label variables LABpc, • •., LAB 3 , which cor¬ 
respond to the input labels Lpc ,..., L 3 ; the constant BOT; and the lattice operators U (join) and Cl (flows). 

The IFC enforcement judgment looks up the corresponding symbolic IFC rule in the table and directly 
evaluates the symbolic expressions in terms of the corresponding lattice operations. In contrast, in Sec- 
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i(n) = Add 

^7^ j -^1 j -^2 j _) ^^add ^rpci^r 

fj, [ni@Li, n 2 @L 2 , a] n@Lpc ^ 

^ [{ni+n2)@Lr,(T] {n+l)@Lrpc 


i{n) = Push m 

^"R. {Lpci _i push ^rpci ^ 

/i [a] n@Lpc -4 [m@Lrja] (nH-l)@L. 


rpc 


t{n) — Load fi{p) = m@L2 

^TZ i^pcj -^11 -^2 7 _) load ^rpc7 

/i [p@Li, a] n@Lpc -4 

/i [m@Lr)CT] {n+l)@Lrpc 


i{n) = Store ^{p) = k@L3 

^TZ {k^pcy Llj -^2; -^ 3 ) store ^rpc: Lr 
pt{p) ^ m@Lr = p! 

p [p@Li, ? 7 i@L 2 7 cr] n@Lpc -4 p' [a\ (n+l)@Lrpc 


i{n) = Jump 

'^TZ {^pci kjl 1 —•) _) '^jump Lj-pc^ _ 
pL [n^@Li,f7] n<^Lpc -4 /x [a] n'^L^pc 


x(n) = Bnz k n' = n+((m = 0)?1 : k) 

7Z i^pc) -^l5_j_) ^^bnz ki/j-pc^ _ 

/X [m@Li,(j] n@Lpc /x [cr] n'^L^pc 


t{n) — Call 

-^1 ? _) _) ^^call Lrpcf 

p [n'@Li,a,a] n@Lpc ^ 
p [a,{n+l)@Lr]<j\ n'@Lrpc 


— Ret ^IZ (-^pC7-^17 _7 _) ^^ret k^rpci _ 

p [n'@Li;a] n@Lpc -4 p [a] n'@Lrpc 


L{n) = Output 

^TZ (-^pc7-^ 1 7—7 _) ^^output Lrpc: kljr 

p [m@Li,a] n@Lpc 
p [a] {n+l)@Lrpc 


Figure 3: Semantics of symbolic rule machine, parameterized by TZ 
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€rpc 


Gf' 


add 

TRUE 

LABpc 


LABi 

U LAB 2 

output 

TRUE 

LABpc 


LABi 

U LABpc 

push 

TRUE 

LABpc 


BOT 


load 

TRUE 

LABpc 


LABi 

U LAB 2 

store 

LABiU LABpc C LAB 3 

LABpc 


LABi 

U LAB 2 U LABpc 

jump 

TRUE 

LABi U 

LABpc 



bnz 

TRUE 

LABi U 

LABpc 

_ 


call 

TRUE 

LABi U 

LABpc 

LABpc 


ret 

TRUE 

LABi 





Figure 4; Rule table corresponding to abstract IFC machine 


tion 6 we compile this rule table into the IFC fault handler for the concrete machine. Formally, the IFC 
enforcement judgment is defined by the two following cases, depending on whether the second output label 
is relevant or not: 

Rule-fiiop) = {allow, Crpc, e-r) Ruleuiop) = {allow, Crpc, _) 

p I allow p I c-ypQ j- -Rj'pc p I aj' j. Ly p I allow p I j. -Rj'pc 

^TZ P op k^rpC 7 '^TZ P op k^rpcj _ 
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LE, Cr, 


BOT 

LABi 

BE, allow 

1 TRUE 

LAB 2 


1 LE\ C LE 2 

LAB 3 

LABpc 

LEf U I/E 2 

~LE) 


AND BEi BE 2 


OR BEi BE 2 

1 (BE) 


Figure 5: Symbolic IFC rule language syntax 

p h LE\ ^ Z/i p\- LE2 ir L2 Li < L2 p\~ BEi p h BE2 
p \~ TRUE p h LE\ C LE2 p AND BEi BE2 

p h BE\ p h BE2 p LEi 4, ii p\~ LE2 4, Z/2 

p \~ OR BEi BE2 p F OR BEi BE2 p F BOT ^ -L p F {LEi U LE2) 4 ' {LiWL2) 

{Lpc,ii,i 2 ,£ 3 ) F LABpc 4 ^ Lpc {Lpa,li, ^2,^3) F LAB2 F L2 

{Lpc, ii, £2, ^3) F LABi 4, Li {Lpc,ii,l2, L3) F LAB3 4, L3 


Figure 6 : Symbolic IFC rule language semantics 


Here p is a 4-tuple of labels, Rule-jz looks up the relevant opcode in rule table TZ, and the expression 
evaluation judgment p F ... is defined in Figure 6 . 


5 Concrete Machine 

The concrete machine provides low-level support for efficiently implementing many different high-level 
policies (IFC and others) with a combination of a hardware rule cache and a software cache fault handler. 
In this section we focus on the concrete machine’s hardware, which is completely generic, while in Section 6 
we describe a specific fault handler corresponding to the IFC rules of the symbolic rule machine. 

The concrete machine has the same general structure as the more abstract ones, but differs in several 
important respects. One is that it annotates data values with integer tags T, rather than with labels L from an 
abstract lattice; thus, the concrete atoms a in the data memories and the stack have the form n®!. Similarly, 
a concrete action a is either a concrete atom or the silent action r. We consistently use the word label and 
variable L to refer to the (abstract, lattice-structured) labels of the abstract and symbolic rule machines and 
the word tag and variable T for concrete integers representing labels. Using plain integers as tags allows us 
to delegate their interpretation entirely to software. In this paper we focus solely on using tags to implement 
IFC labels, although they could also be used for enforcing other policies, such as type and memory safety 
or control-flow integrity [9, 35]. For instance, to implement the two-point abstract lattice with _L < T, 
we could use 0 to represent _L and 1 to represent T, making the operations V and < easy to implement 
(see Section 6 ). For richer abstract lattices, a more complex concrete representation might be needed; for 
example, a label containing an arbitrary set of principals might be represented concretely by a pointer to an 
array data structure (see Section 11). In places where a tag is needed but its value is irrelevant, the concrete 
machine uses a specific but arbitrary default tag value (e.g., -1), which we write Td. 

A second important difference is that the concrete machine has two modes: user mode (u), for executing 
the ordinary user program, and kernel mode (k), for handling rule cache faults. To support these two modes, 
the concrete machine’s state contains a privilege bit tt, a separate kernel instruction memory and a 
separate kernel data memory k, in addition to the user instruction memory t, the user data memory p, 
the stack cr, and the PC. When the machine is operating in user mode (tt = u), instructions are looked 
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up using the PC as an index into t, and loads and stores use p; when in kernel mode (tt = k), the PC 
is treated as an index into (j), and loads and stores use k. The concrete step relation has the form l,4) \- 
TTi Ki pLi [ui] pci A- 712 ^2 P '2 [o' 2 ] pc 2 - As before, since t, and (j) are fixed, we normally leave them 
implicit when writing down machine transitions. 

The concrete machine has the same instruction set as the previous ones, allowing user programs to 
be run on all three machines unchanged. But the tag-related semantics of instructions depends on the 
privilege mode, and in user mode the semantics further depends on the state of the rule cache. In the real 
SAFE machine, the rule cache may contain thousands of entries and is implemented as a separate near- 
associative memory [33] accessed by special instructions. Here, for simplicity, we use a cache with just 
one entry, located at the start of kernel memory, and use Load and Store instructions to manipulate it. 
When implementing simple IFC labels such as the two-point lattice defined above, the rule cache is all that 
needs to live in k. More complex label models, on the other hand, such as those of Section 11, may require 
additional memory to store internal data structures. 

The rule cache holds a single rule instance, represented graphically like this: 



Location 0 holds an integer representing an opcode. (Since the exact choice of representation doesn’t 
matter, we will denote each opcode with a lowercase identifier—for example, we might define add = 0 , 
output = 1, etc.) Location 1 holds the PC tag. Locations 2 to 4 hold the tags of any other arguments needed 
by this particular opcode. Location 5 holds the tag that should go on the PC after this instruction executes, 
and location 6 holds the tag for the instruction’s result value, if needed. For example, suppose the cache 
contains this: 



(Note that we are showing just the “payload” part of these seven atoms; by convention, the tag part is always 
Td, and we do not display it.) This one-line rule cache should be thought of as implementing a (very) partial 
function: when the input is add 0 11-1, the output is 0 1 ; otherwise it is undefined. If 0 is the 
tag representing the label _L, 1 represents T, and -1 is the default tag To, this can be interpreted abstractly 
as follows: “If the next instruction is Add, the PC is labeled _L, and the two relevant arguments are both 
labeled T, then the instruction should be allowed, the label on the new PC should be _L, and the label on the 
result of the operation is T.” 

There are two sets of stepping rules governing the behavior of the concrete machine in user mode; which 
set applies depends on whether the current machine state matches the current contents of the rule cache. In 
the “cache hit” case (Figure 7), the instruction executes normally, with the cache’s output determining the 
new PC tag and result tag (if any). 

In the “cache miss” case (Figure 8 ), the relevant parts of the current state (opcode, PC tag, argument 
tags) are stored into the input part of the single cache line and the machine simulates a Call to the fault 
handler. 

To see how this works in more detail, consider the two user-mode stepping rules for the Add instruction. 

i{n) = Add i{n) = Add 

K = add Tpc Ti T 2 Td T^pc T^ Ki 7 ^ add Tpc Ti T 2 Td = Kj 

U K p [ni@Ti, n 2 @T 2 , cr] n@Tpc U [Ki,Ko\ p [ni@Ti, 7i2®T2, cr] n@Tpc 

[i K p [(ni-f712)®Tr, cr] n+l<Silrpc k [Kj,K-o] p [(nsTpc, u); 7ii@Ti, n 2 @T 2 , O’] 0@Td 

In the first rule (cache hit), the side condition demands that the input part of the current cache contents have 
the form add Tp^ Ti T 2 Td , where Tpc is the tag on the current PC, Ti and T 2 are the tags on the top 
two atoms on the stack, and the fourth element is the default tag. In this case, the output part of the rule, 

Trpc Tc , determines the tag T^pc on the PC and the tag T^ on the new atom pushed onto the stack in the 

next machine state. 

In the second rule (cache miss), the notation [ni, Kq] means “let Ki be the input part of the current rule 
cache and Kq be the output part.” The side condition says that the current input part Ki does not have the 
desired form add Tpc Ti T 2 Td , so the machine needs to enter the fault handler. The next machine state 
is formed as follows: (i) the input part of the cache is set to the desired form kj and the output part is set 
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i(n) = Add i{n) = Push m 



U K [ni@Ti, n 2 @T 2 , ct] n@Tpc -4 U k [a] n@Tpc ^ 

U K /i [(ni + ?Z2)@Tr, ct] n+l@Trpc U K ^ [m@Tr,Cr] n+l@Trpc 


i{n) = Load ^{p) = m@T2 



U K fi [p@Ti,cr] n@Tpc ^ 

U K p, [m@Tr,cr] n+l@Trpc 


i(n) = Store p{p) = A:@T3 

K, = 

p{p) ■<r- (m@Tr) = p' 

[i K p [p@Ti, m@T2 , ct] n@Tpc 
U K /z' [cr] n+l@Trpc 



i(n) = Jump 



[i K p [n'eTijCr] ?z@Tpc -4 
U K p [cr] n'@Trpc 


i{n) = Bnz k 

K = 

n' = n+{{m = 0)?1 : k) 

U K p [m@Ti,(T] n@Tpc ^ 

U re /i [cr] n'@Trpc 



i(n) = Call L{n) — Ret 



[i K p [n'@Ti,a,cr] n@Tpc -4 U re /x [(n'aTi, u); ct] n@Tpc ^ 

U K /i [a, (n+l@Tr, u); cr] n'&Trpc U n p [a] n'@Trpc 



Figure 7: Concrete step relation: user mode, cache hit case 

to kd = Td Td ; (ii) a new return frame is pushed on top of the stack to remember the current PC and 
privilege bit (u); (iii) the privilege bit is set to k (which will cause the next instruction to be read from the 
kernel instruction memory); and (iv) the PC is set to 0, the location in the kernel instruction memory where 
the fault handler routine begins. 

What happens next is up to the fault handler code. Its job is to examine the contents of the first five 
kernel memory locations and either (i) write appropriate tags for the result and new PC into the sixth and 
seventh kernel memory locations and then perform a Ret to go back to user mode and restart the faulting 
instruction, or (ii) stop the machine by jumping to an invalid PC (-1) to signal that the attempted combination 
of opcode and argument tags is illegal.^ This mechanism is general and can be used to implement many 
different high-level policies (IFC and others). 

In kernel mode (Figure 9), the treatment of tags is almost completely degenerate: to avoid infinite 
regress, the concrete machine does not consult the rule cache while in kernel mode. For most instructions, 
tags read from the current machine state are ignored (indicated by _) and tags written to the new state are 

^As explained in Section 2, in this work we assume for simplicity that policy violations are fatal. Recent work [45] has shown that 
it is possible to recover from IFC violations while preserving noninterference. 
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u 

k 


i(n) = Add 


Kj add Tp 


T 2 Td 


= Ki 


[Ki,Ko] M [ni@Ti, n2@T2,CT] n@Tpc -4 

[kjjKd] M [(?i@Tpc, u); ni@Ti, ^ 20 X 2 , ct] 0@Td 


i{n) = Push m 


K, 7^ push 

Tpc Td Td Td = K 


U [Ki,Ko] n [a] n@Tpc 

k [Kj,K^] /z [(?z@Tpc, u); cr] 0 @Td 


i(n) = Load a(p) = W'®T2 


Ki 4 load 


Ti T 2 Td 

= 

U [Ki,Ko] A b®Ti, cr] 

k [kjjKd] a [(ti®Tpc,u);p@iTi,cr] 

t(n) = Jump 

n@Tpc —> 
0@Td 

7^ jump Tpc 

Ti 

Td 

Td 

1= 

u [Ki,Ko\ A [n'@Ti,CT] waTpc A- 

k [KjjKo] A [(ti®Tpc, u); n'@Ti, ct] OqTd 

t(n) = Call 

Ki 4 call 


Ti 

Td 

|Td 

= Kj 


U [KijKo] M [n'@Ti,a,(T] n@Tpc ^ 

k [kjjKd] a [(^®Tpc, u); n'@Ti, a, (t] 0<aTD 


i{n) = Store 


store 


m(p) = fc@T 3 

Ti|T 2 |T 3 |=AC, 


U [Ki,Ko] fj. [p<aTi,m@T2,cr] n@Tpc 

k [kj,Kd] M [(ri®Tpc, u);p@Ti, m@T2, cr] OqTd 


T 



(.(n) = Bnz k 
7^ I 


bnz 


Ti Td Td 


U [Ki,Ko\ M [m@Ti,cr] n@Tpc 

k [kjjKd] a [(n®Tpc, u); m@Ti, ct] 0@Td 


T 



i(n) = Ret 

Ki 7 ^ I 


ret 


Ti Td Td 


U [Ki,Ko\ M [(n'@Ti, 7 r); cr] n@Tpc 

k [kjjKd] a [(ri®Tpc, u); (n'@Ti, 7 r); ct] OqTd 


T 



i,(n) = Output 


tii / output T- 


pc 


U [Ki,Ko] n@Tpc 

k [Kj,/«D] [(ri®Tpc, u); m@Ti, (t] 0<aTD 


r 



Figure 8: Concrete step relation: user mode, cache miss case 


set to Td- This can be seen for instance in the kernel-mode step rule for addition 

4 >{n) = Add 

k K /i n 2 @_, ct] n@_ -4 

k K /i [(ni+n2)®TD, ct] n+liaTo 

The only significant exceptions to this pattern are Load and Store, which preserve the tag of the datum 
being read from or written to memory, and Ret, which takes both the privilege bit and the new PC (including 
its tag!) from the return frame at the top of the stack. This is critical, since a Ret instruction is used to return 
from kernel to user mode when the fault handler has finished executing. 


(j){n) = Ret 

k K ^ [(n'laTi, tt); a] n@_ -4 tt k iJ, [a] n'@Ti 

A final point is that Output is not permitted in kernel mode, which guarantees that kernel actions are 
always the silent action t. 

As an illustration of how all this works, suppose again that i = [..., Add, ...], and that the concrete 
integer tag 0 represents the abstract label _L, 1 represents T, and -1 is Td. Then, in a cache-hit configuration, 
we have (omitting the silent r label on transitions): 

/X [7@0, 5@1] niaO —>■ 
fj, [12@1] (n-|-l)@0 


add 

0 

0 

0 

-1 

0 

0 

add 

0 

0 

0 

-1 

0 

0 
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(j){n) = Add 

k K [ni® , n 2 @_, ct] n@_ -4 

k K /i. [(ni+n2)@TD, ct] n+laTo 

(f){n) = Load k{p) = m@Ti 

k K /i [p@_, cr] n@_ -4 

k K /i [m@Ti,(T] n+l@TD 

()i(n) = Jump 

k K ^ cr] n@_ -4 k K /i [cr] n'@TD 


(j){n) = Push m 

k K fi [a] n@_ ^ k K fj, [?71@Td, ct] n+l@TD 


(/)(n) = Store store Kp (m@Ti) = k' 

k K /i [p@_, ?7 i@Ti, ct] ?z@_ -4 

k k' p [a] tz+IqTd 

(f){n) = Bnz k n' = n+{{m = 0)?1 : k) 
k K fx [m@_, a] n@_ ^ k k p [a] ti'qTd 


(j){n) = Call = Rst 

k K p a, a] n@_ -4 k k p [(n'@Ti, tt); cr] n@_ -4 tt k p [a] n'@Ti 

k K p [o, (n+l@TD, k); cr] n'aTo 


Figure 9: Concrete step relation (kernel mode) 


On the other hand, if the tags on both operands are 1 (i.e., T), then the first step will miss in the cache and 
reduction will proceed as follows: 


add 


0 


-1 

0 

E 

add 

[I 

m 

m 

-1 

-1 

-1 


... fault handler runs ... 


add 

E 

E 

E 

-1 

0 

E 

add 

E 

E 

E 

-1 

0 

E 

add 

E 

E 

E 

-1 

0 

E 


A 

M 


J7@1,5®1] 

n@0 —s 

> (cache miss) 

[(n®0, u); 7@1, 5@1] 

0@-l —s 

> (call fault handler, kernel mode) 

[(n®0, u); 7@1, 5@1] 

k@-l —i 

- (fault handler returns to user mode) 

[7@1,5@1] 

n@0 —s 

- (restarts instruction, cache now hits) 

J12@l] 

(n-|-l)@0 



6 Fault Handler for IFC 

Now we assemble the pieces. A concrete IFC machine implementing the symbolic rule machine defined in 
Section 4 can be obtained by installing appropriate fault handler code in the kernel instruction memory of 
the concrete machine presented in Section 5. In essence, this handler must emulate how the symbolic rule 
machine looks up and evaluates the DSL expressions in a given IFC rule table. We choose to generate the 
handler code by compiling the lookup and DSL evaluation relations directly into machine code. (An alter¬ 
native would be to represent the rule table as abstract syntax in the kernel memory and write an interpreter 
in machine code for the DSL, but the compilation approach seems to lead to simpler code and proofs.) 

The handler compilation scheme is given in Figure 10. Each gen* function generates a list of concrete 
machine instructions; the sequence generated by the top-level genFaultHandler is intended to be installed 
starting at location 0 in the concrete machine’s kernel instruction memory. The implicit addr* parameters 
are symbolic names for the locations of the opcode and various tags in the concrete machine’s rule cache, as 
described in Section 5. The entire generator is parameterized by an arbitrary rule table TZ. We make heavy 
use of the (obvious) encoding of booleans where false is represented by 0 and true by any non-zero value. 

The top-level handler works in three phases. The first phase, genComputeResults, does most of the 
work: it consists of a large nested if-then-else chain, built using genIndexedCases, that compares the opcode 
of the faulting instruction against each possible opcode and, on a match, executes the code generated for 
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the corresponding symbolic IFC rule. The code generated for each symbolic IFC rule (by gen Apply Rule) 
pushes its results onto the stack: a flag indicating whether the instruction is allowed and, if so, the result-PC 
and result-value tags. This first phase never writes to memory or transfers control outside the handler; this 
makes it fairly easy to prove correct. 

The second phase of the top-level handler, genStoreResults, reads the computed results off the stack 
and updates the rule cache appropriately. If the result indicates that the instruction is allowed, the result PC 
and value tags are written to the cache, and true is pushed on the stack; otherwise, nothing is written to the 
cache, and false is pushed on the stack. 

The third and final phase of the top-level handler tests the boolean just pushed onto the stack and either 
returns to user code (instruction is allowed) or jumps to address -1 (disallowed). 

The code for symbolic rule compilation is built by straightforward recursive traversal of the rule DSL 
syntax for label-valued expressions (genELab) and boolean-valued expressions (genBool). These func¬ 
tions are (implicitly) parameterized by the definitions of lattice-specific generators genBot, genJoin, and 
gen Flows. To implement these generators for a particular lattice, we first need to choose how to represent 
abstract labels as integer tags, and then determine a sequence of instructions that encodes each operation. 
We call such an encoding scheme a concrete lattice. For example, the abstract labels in the two-point lat¬ 
tice can be encoded like booleans, representing _L by 0, T by non-0, and instantiating genBot, gen Join, 
and gen Flows with code for computing false, disjunction, and implication, respectively. A simple concrete 
lattice like this can be formalized as a tuple CL = (Tag, Lab, genBot, genJoin, genFlows), where the en¬ 
coding and decoding functions Lab and Tag satisfy LaboTag = zd; to streamline the exposition, we assume 
this form of concrete lattice for most of the paper. The more realistic encoding in Section 11 will require a 
more complex treatment. 

To raise the level of abstraction of the handler code, we make heavy use of structured code generators; 
this makes it easier both to understand the code and to prove it correct using a custom Hoare logic that 
follows the structure of the generators (see Section 7). For example, the gen If function takes two code 
sequences, representing the “then” and “else” branches of a conditional, and generates code to test the top 
of the stack and dispatch control appropriately. The higher-order generator genIndexedCases takes a list 
of integer indices (e.g., opcodes) and functions for generating guards and branch bodies from an index, 
and generates code that will run the guards in order until one of them computes true, at which point the 
corresponding branch body is run. 


7 Correctness of the Fault Handler Generator 

We now turn our attention to verification, beginning with the fault handler. We must show that the gen¬ 
erated fault handler emulates the IFC enforcement judgment {Lpc, ( 1 ,^ 2 , '^opcode Lrpc, Lr of the 
symbolic rule machine. The statement and proof of correctness are parametric over the symbolic IFC rule 
table TZ and concrete lattice, and hence over correctness lemmas for the lattice operations. 


Correctness statement Let TZ be an arbitrary rule table and (p-ji = genFaultHandler TZ be the corre¬ 
sponding generated fault handler. We specify how (f)'ii behaves as a whole—as a relation between initial 
state on entry and final state on completion—^using the relation (j) h csi — CS 2 , defined as the reflexive 
transitive closure of the concrete step relation, with the constraints that the fault handler code is (p and all 
intermediate states (i.e., strictly preceding CS 2 ) have privilege bit k. 

The correctness statement is captured by the following two lemmas. Intuitively, if the symbolic IFC en¬ 
forcement judgment allows some given user instruction, then executing p-ji (stored at kernel mode location 
0 ) updates the cache to contain the tag encoding of the appropriate result labels and returns to user-mode; 
otherwise, p-ji halts the machine (pc = -1). 


Lemma 7.1 (Fault handler correctness, allowed case). 
Suppose that {Lpc, ii, £ 2 ,^ 3 ) '^opcode LrpcLr and 


opcode Tag(Lpc) Tag(fi) Tag(f2) Tag(f3) 
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genFaultHandler TZ = genComputeResults TZ 4-f 
genStoreResults ++ 
genlf [Ret] [Push (- 1 ); Jump] 


genComputeResults TZ — 

genIndexedCases [] genMatchOp (genApplyRule o opcodes 

genMatchOp op = 

[Push op] ++ genLoadFrom addrOpLabel 4+ genEqual 
genEqual = [Sub] ++ genNot 

genApplyRule {allow, Crpc, Cr) = genBool allow 4-f 
genlf (genSome (genELab Crpc ^ genELab Cr)) genNone 

genELab BOT = genBot 

LABi = genLoadFrom addrTag^ 

LEi U LE2 — genELab Li 72 ++genELab Li?i-H-genjoin 

genBool TRUE = genTrue 

LEi C LE2 = genELab LE2 ++ genELab LEi ++ genFlows 


genStoreResults = 

genlf (genStoreAt addrTag^ ++ genStoreAt addrTag^p^ ^ genTrue) 
gen False 


genFalse 

genTrue 

genAnd 

genOr 

genNot 

genimpi 

genSome c 

genNone 


[PushO] 

[Push 1 ] 

genlf [[ (genPop ++ genFalse) 
genlf (genPop ++ genTrue) [[ 
genlf genFalse genTrue 
genNot ++ genOr 
c ++ genTrue 
genFalse 


genIndexedCases genDefault genGuard genBody = g 
where 3 [[ = genDefault 

g {n :: ns) = genGuard n ++ genlf {genBody n) {g ns) 

genlf f/ = genSkiplf (length/')++/'++f 

where f' = f ++ genSkip(length t) 
genSkipn = genTrue++genSkiplf n 
genSkiplf n = [Bnz (n+l)[ 

genStoreAt = [Push p; Store[ 

genLoadFrom p = [Push p; Load[ 

genPop = [Bnz 1 ] 

opcodes = [add; output;...; ret] 


Figure 10; Generation of fault handler from IFC rule table. 
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Then 


(fin h (k [k^,Ko] [(pc, u);ct] OoTd) ->j( 

(u [«*,<] p M pc) 

with output cache «;(, = (Tag (L^pc), Tag (Lr)) . 

Lemma 7.2 (Fault handler correctness, disallowed case). Suppose that 1 - 7 ^ (Tpo ^i) ^ 2 ) ^ 3 ) '/^opcode, 


Ki = 

Then, for some final stack a', 


opcode Tag(Lpc) Tag(fi) Tag(f 2 ) Tag(4) 


07?, I- (k [Ki,Ko] M [(pc, u);cr] OoTd) 
(k [tii,Ko] p [<j'] -1@Td). 


Proof methodology The fault handler is compiled by composing generators (Figure 10); accordingly, the 
proofs of these two lemmas reduce to correctness proofs for the generators. We employ a custom Hoare 
logic for specifying the generators themselves, which makes the code generation proof simple, reusable, 
and scalable. This is where defining a DSL for IFC rules and a structured compiler proves to be very useful 
approach, e.g., compared to symbolic interpretation of hand-written code. 

Our logic comprises two kinds of Hoare triples. The generated code mostly consists of self-contained 
instruction sequences that terminate by “falling off the end”—i.e., that never return or jump outside them¬ 
selves, although they may contain internal jumps (e.g., to implement conditionals). The only exception is 
the final step of the handler (third line of genFaultHandler in Figure 10). We therefore define a standard 
Hoare triple {P} c {Q}, suitable for reasoning about self-contained code, and use it for the bulk of the 
proof. To specify the final handler step, we define a non-standard triple {P} c {(5}pc for reasoning about 
escaping code. 

Self-contained-code Hoare triples The triple {P} c { Q}, where P and Q are predicates on k x ct, says 
that, if the kernel instruction memory 0 contains the code sequence c starting at the current PC, and if 
the current memory and stack satisfy P, then the machine will run (in kernel mode) until the PC points to 
the instruction immediately following the sequence c, with a resulting memory and stack satisfying Q. In 
symbols: 

{P} c {Q} = c = 0 (n),..., 0 (n' - 1) A P(k, a) 

3 K,' a'. Q{k', a') A 0 h (k k p [ct] nalD) —(k k' p [a'] 

Note that the instruction memory 0 is unconstrained outside of c, so if c is not self-contained, no triple 
about it will be provable; thus, these triples obey the usual composition laws (e.g., the rule of consequence). 

Vkct. P'{K,a) P{K,a) 

Vkct. Q(k,(j) Q'{K,a) 

{P}c{Q} {Pl}ci{P 2 } {P2 }c2 {P3} 

mam {p'}o{q'} {Pii ci^c2 {P3} 

Also, because the concrete machine is deterministic, these triples express total, rather than partial, correct¬ 
ness, which is essential for proving termination in Lemma 7.1 and Lemma 7.2. To aid automation of proofs 
about code sequences, we give triples in weakest-precondition style. 

We build proofs by composing atomic specifications of individual instructions, such as 

P(k, a) := 3 m Ti n2 T2 a'. a = ^20X2, a' A Q{k, ((?ri-|-n2)@TD, a')) 

{P} [Add] {Q} ’ 

with specifications for structured code generators, such as 

P{K,a) := 3 nTa'. a = n@T, a' A (n 7^ 0 =A Pi(«:, cr')) A (n = 0 P2(«^, cr')) 

{Pi}ci{Q} {P2} C2 {Q} 

{P} genlf Cl C 2 {<5} 
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(We emphasize that all such specifications are verified, not axiomatized as the inference rule notation might 
suggest.) We also prove a specification for the specialized case statement genIndexedCases. Although this 
specification is quite complex when written in full detail (and thus omitted here), it is intuitively simple: 
given a list of indices and functions for generating guards and branches from the indices, genIndexedCases 
will run the guards in order until one of them computes true (more precisely, its integer encoding 1), at 
which point the corresponding branch is run. 

The concrete implementations of the lattice operations are also specified using triples in this style. 

P(k, cr) := Q{k, (Tag (_L)@Td,(t)) 

{P} genBot {Q} 

P{K,a) := 3 TT'cr'. cr = Tag (T)@Td, Tag (L')@Td, ct' A (3(k, Tag (LVL')@Td, ct') 

{P} genJoin {Q} 

P{k, a) :=3LL' a', a = Tag (L)@Td, Tag (T')@Td, o' A (if L < L' then 1 else 0)@Ti), a') 

{P} genFlows { Q} 

For the two-point lattice, it is easy to prove that the implemented operators satisfy these specifications; 
Section 11 describes an analogous result for a lattice of sets of principals. 

Going a bit further towards bridging the gap between the symbolic rule and concrete machines, we 
prove specifications for the generation of label expressions 

{Lpci 

Ko = 

P{k, a) := k = Ko a Q{k, (Tag (L)@Td, a)) 

{P} genELab LE {Q} 

and for the code generated to implement the application of a symbolic IFC symbolic rule. For instance, the 
case where the instruction is allowed is described by the following specification (the integer 1 pushed on 
the output stack encodes the fact that the rule is allowed): 


Li,L2,Lo)\~ LE iL 


op 


Tag(Lpc) Tag(Li) Tag(T 2 ) Tag(L 3 ) 


Rulen{op) = {allow, e 

Ko=[^ 


r) 


Tag(Lpc) Tag(Li) Tag(T 2 ) Tag(L 3 ) 


{^pcj Pi 5 dj2, P 3 ) 


op d^rpc 


, Lj- 


P{k, a) := k = Ko a Q{k, (IoTd, Tag {Lr)@To, Tag (Lrpc)®TD, cr)) 
{P} genApplyRule {allow, Crpc, e^) {<5} 


Escaping-code Hoare triples To be able to specify the entire code of the generated fault handler, we also 
define a second form of triple, {P} c { Q}pc, which specifies mostly self-contained, total code c that either 
makes exactly one jump outside of c or returns out of kernel mode. This non-locality is needed because the 
fault handler checks whether an information-flow violation is about to occur, and returns to the user-mode 
caller if not, or jumps to an invalid address otherwise. More precisely, if P and Q are predicates on «; x cr 
and O is a function from rt x cr to outcomes (the constants Success and Failure), then {P} c {Q}pc 
holds if, whenever the kernel instruction memory (p contains the sequence c starting at the current PC, the 
current cache and stack satisfy P, and 

• if O computes Success then the machine runs (in kernel mode) until it returns to user code at pc, 
and Q is satisfied. 

• if O computes Failure then the machine runs (in kernel mode) until it halts (pc = — 1 in kernel 
mode), and Q is satisfied. 

Or, in symbols, 

{P} c { Q}p^ = c = (j){n), ...,(j){n+\c\-l) A P{k, cr) 

3 cr'. Q{K',a') 

A{0{k, cr) = Success (j) \- {k k p, [a] L W] Pc)) 

A{0{k, cr) = Failure (j)\- {k k p [a] naTn) — (k k' p [cr'] —1 @Td)) 
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To compose self-contained code with escaping code, we prove two composition laws for these triples, 
one for pre-composing with specified self-contained code and another for post-composing with arbitrary 
(unreachable) code: 

{Pi} Cl {^2} {P 2 } C2 {Paige {P} 

[Pi] C 1 ++C 2 [Psj^c [P) C 1 ++C 2 

We use these new triples to specify the Ret and Jump instructions, which could not be given useful speci¬ 
fications using the self-contained-code triples: 

P{k, (t) := 3 a'. Q{k, a') A a = {pc, u); a' P{k, a) := 3 a'. Q{k, a') A a = (— 1)@_, cr' 

0{k, a) := Success 0{k, a) := Failure 

{P} [Ret] { Q}0 {P} [Jump] { Q}0 

Everything comes together in verifying the fault handler. We use contained-code triples to specify 
everything except for [Ret], [Jump], and the final genlf, and then use the escaping-code triple composition 
laws to connect the non-returning part of the fault handler to the final genlf. 

8 Refinement 

We have two remaining verification goals. First, we want to show that the concrete machine of Section 5 
(running the fault handler of Section 6 compiled from enjoys TINT Proving this directly for the 

concrete machine would be dauntingly complex, so instead we show that the concrete machine is an im¬ 
plementation of the abstract machine, for which noninterference will be much easier to prove (Section 10). 
Second, since a trivial always-diverging machine also has TINI, we want to show that the concrete machine 
is faithful implementation of the abstract machine that emulates all its behaviors. 

We phrase these two results using the notion of machine refinement, which we develop in this section, 
and which we prove in Section 10 to be TINI preserving. In Section 9, we prove a two-way refinement (one 
direction for each goal), between the abstract and concrete machines, via the symbolic rule machine in both 
directions. 

From here on we sometimes mention different machines (abstract, symbolic rule, or concrete) in the 
same statement (e.g., when discussing refinement), and sometimes talk about machines genetically (e.g., 
when defining TINI for all our machines); for these purposes, it is useful to define a generic notion of 
machine. 

Definition 8.1. A generic machine (or just machine) is a 5-tuple M = {S, E, I, Init), where S' is a set 
of states (ranged over by s), E is a set of events (ranged over by e), C S x (i? -F {r}) x S is a step 
relation, and / is a set of input data (ranged over by i) that can be used to build initial states of the machine 
with the function Init G I —t S. We call E + {r} the set of actions of M (ranged over by a). 

Conceptually, a machine’s program is included in its input data and gets “loaded” by the function Init, 
which also initializes the machine memory, stack, and PC. The notion of generic machine abstracts all these 
details, allowing uniform definitions of refinement and TINI that apply to all three of our IFC machines. To 
avoid stating it several times below, we stipulate that when we instantiate Definition 8.1 to any of our IFC 
machines, Init must produce an initial stack with no return frames. 

A generic step si S2 or si -4 S2 produces event e or is silent. The reflexive-transitive closure of 
such steps, omitting silent steps (written si A* S 2 ) produces traces —i.e., lists, t, of events. It is defined 
inductively by 

Si A S2 52 ^*53 Si A S2 S2 A* S3 

e.t ^ t ^ ^ 

s S Si —S3 Si ->* S3 

where we write e for the empty trace and e.t for consing e to t. When the end state of a step starting in state 

€ i 

s is not relevant we write s -A, and similarly s -A* for traces. 

When relating executions of two different machines through a refinement, we establish a correspondence 
between their traces. This relation is usually derived from an elementary relation on events, > C Ei x E 2 , 
which is lifted to actions and traces: 


20 



Definition 8.2 (Matching). Given a relation > C Ei x E 2 between two sets of events, its lifts to actions 
and traces are defined; 

ai [>] a2 = (ofi = T = a2 V ai = ei > 62 = 02) 

x [>\y = length(f) = length(y) A Vi < length(£). Xi o t/i- 


We are now ready to define refinement. 


Definition 8.3 (Refinement). Let Mi = {S\,Ei,Ix,- —>•1 ■,Initi) and M 2 = {S 2 ,E 2 ,l 2 ,- -^2 •,Init 2 ) 
be two machines. A refinement of Mi into M 2 is a pair of relations ([>i,>e), where i>i C /i x I 2 and 
Oe Q Lii X £^ 2 , such that whenever ii >ii 2 and/nif 2 (* 2 ) there exists a trace G such that/mfi(ii) 
and ti [[>e] ^ 2 - We also say that M 2 refines Mi. Graphically: 


*1 

*2 




Imt2{i2) 



[>e] 


(Plain lines denote premises, dashed ones conclusions.) 

In order to prove refinement, we need a variant that considers executions starting at arbitrary related 
states. 

Definition 8.4 (Refinement via states). Let Mi, M 2 be as above. A state refinement of Mi into M 2 is a 
pair of relations (Og, i>e), where >s C S'! x S 2 and Og G i?i x E 2 , such that, whenever si [>s S 2 and S 2 
there exists ti such that si and L [[>e] ^ 2 - 


Si 


S2 


^2 


[^e 


If the relation on inputs is compatible with the one on states, we can use state refinement to prove 
refinement. 

Lemma 8.5. Suppose ii 12 => [>s Init 2 {i 2 ), for all zi and Z 2 . If (>s,>e) is a state refinement 

then (i>i, i>e) is a refinement. 

Our plan to derive a refinement between the abstract and concrete machines via the symbolic rule 
machine requires composition of refinements. 

Lemma 8.6 (Refinement Composition). Let (>P, \>l^) be a refinement between Mi and M 2 , and (i>P, >g^) 
a refinement between M 2 and M3. The pair (i>P o o that composes the matching relations for 

initial data and events on each layer is a refinement between Mi and M3. This can be summarized in the 
following diagram: 


>f o>P 


fl T - - 

zi - - ->♦ Si 


^2 

*2 -> S2 


*3 —^ S 3 
£3 - 


[>f 
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9 Refinements Between Concrete and Abstract 


In this section, we show that (1) the concrete machine refines the symbolic rule machine, and (2) vice versa. 
Using (1) we will be able to show in Section 10 that the concrete machine is noninterfering. From (2) 
we know that the concrete machine faithfully implements the abstract one, exactly reflecting its execution 
traces. 

9.1 Abstract and symbolic rule machines 

The symbolic rule machine (with the rule table is a simple reformulation of the abstract machine. 

Their step relations are (extensionally) equal, and started from the same input data they emit the same 
traces. 

Definition 9.1 (Abstract and symbolic rule machines as generic machines). For both abstract and symbolic 
rule machines, input data is a 4-tuple {p, args, n, L) where p is a program, args is a list of atoms (the initial 
stack), and n is the size of the memory, initialized with n copies of 0@L. The initial PC is 0@L. 

Lemma 9.2. The symbolic rule machine instantiated with the rule table refines the abstract machine 
through (=,=). 

9.2 Concrete machine refines symbolic rule machine 

We prove this refinement using a fixed but arbitrary rule table, TZ, an abstract lattice of labels, and a concrete 
lattice of tags. The proof uses the correctness of the fault handler (Section 7), so we assume that the fault 
handler of the concrete machine corresponds to the rule table of the symbolic rule machine {(p = p-ji) and 
that the encoding of abstract labels as integer tags is correct. 

Definition 9.3 (Concrete machine as generic machine). The input data of the concrete machine is a 4- 
tuple (p, args, n, T) where p is a program, args is a list of concrete atoms (the initial stack), and the initial 
memory is n copies of 0@T. The initial PC is 0<aT. The machine starts in user mode, the cache is initialized 
with an illegal opcode so that the first instruction always faults (giving the fault handler a chance to run and 
install a correct rule without requiring the initialization process to invent one), and the fault handler code 
parameterizing the machine is installed in the initial privileged instruction memory p. 

The input data and events of the symbolic rule and concrete machines are of different kinds; they are 
matched using relations ([>^ and respectively) stipulating that payload values should be equal and that 
labels should correspond to tags modulo the function Tag of the concrete lattice. 

args' = map {X{n@L). n@Tag(L)) args 
(p, args, n, L) (p, args', n, Tag(T)) n@L og n@Tag(L) 

Theorem 9.4. The concrete IFC machine refines the symbolic rule machine, through (>°, >“). 

We prove this theorem by a refinement via states (Lemma 9.7); this, in turn, relies on two technical 
lemmas (9.5 and 9.6). 

We begin by defining a matching relation between the states of the concrete and symbolic rule 
machines such that 

1 - I>^ Iq Iflitqi^iqP I>g Iflit(Pj, 

2 . (i>g, >g) is a state refinement of the symbolic rule machine into the concrete machine. 

We define as 

TZ\- K aqt>a<Jc >m Me ^2) 

Pq,[aq],n®L U,K,Mc,[o-e],n@Tag(L) 

where the new notations are defined as follows. The relation t>m demands that the memories be equal up to 
the conversion of labels to concrete tags. The relation t>cr on stacks is similar, but additionally requires that 


22 




return frames in the concrete stack have their privilege bit set to U. The basic idea is to match, in i>g, only 
concrete states that are in user mode. We also need to track an extra invariant, TZ\- n, which means that the 
cache k is consistent with the table TZ —i.e., k never lies. More precisely, the output part of k represents the 
result of applying the symbolic rule judgment of TZ to the opcode and labels represented in the input part of 


TZ\- [tii, Ko\ = 'i opcode Li L 2 T 3 Lp 


= opcode Tag(Tpc) Tag(Li) Tag(L 2 ) Tag(L 3 ) 


^pcj 

^J^rpc ]^r L/1^ L/2'} ^3^ opcode ^rpc-) Lr A Ko = (Tag(Lrpc),Tag(Lr)) 


To prove refinement via states, we must account for two situations. First, suppose the concrete machine 
can take a user step. In this case, we match that step with a single symbolic rule machine step. We write 
cs^ to denote a concrete state cs whose privilege bit is tt. 


Lemma 9.5 (Refinement, non-faulting concrete step). Let cs“ be a concrete state and suppose that cs" 

CS2- Let qsi be a symbolic rule machine state with qsi i>g 053 . Then there exist qs 2 and such that 
qsi qs 2 , with qs 2 i>s CS 2 , and [[>g] ac- Graphically: 


qsi 

i>s I 


OLa 

- > qs2 

I o; 

ac ^ 


[>e 


Proof. We know that <751 cs^. By definition of in (2), qsi and cs^ are at the same opcode with the 
same stack and memory (up to translation between labels and tags), and TZ L k(cs 3 ). Thus k{cSi) matches 
a line of the symbolic IFC rule table, and since the concrete machine performs a user step from cs" to cs^, 
it is a line that allows a step to be taken. We conclude that the symbolic rule machine is able to perform the 
step to qs 2 as required. □ 


The second case is when the concrete machine faults into kernel mode and returns to user mode after 
some number of steps. 

Lemma 9.6 (Refinement, faulting concrete step). Let be a concrete state, and suppose that the concrete 
machine does a faulting step to cs^, stays in kernel mode until cs^, and then exits kernel mode by stepping 
to Letgso be a state of the symbolic rule machine that matches cSq. Then qsQ>gCS^_^i. Graphically: 


qso 



cs 


u 

0 



' cs' 


cs 


u 

n+1 


Proof. Since the concrete machine performs a faulting step from csq to cs\, we know that the current 
cache input, Ki{cs\), corresponds to the current instruction and the tags it manipulates (they have been 
put there when entering kernel mode). Now, there are two cases. If evaluating the corresponding IFC 
rule at the symbolic rule level succeeds, then we apply Lemma 7.1 to conclude directly. Otherwise, we 
apply Lemma 7.2 and derive that the fault handler ends up in a failing state in kernel mode. This contradicts 
our initial hypothesis saying that the concrete machine performed a sequence of steps returning to user¬ 
mode. □ 

Given two matching states of the concrete and symbolic rule machines, and a concrete execution starting 
at that concrete state, these two lemmas can be applied repeatedly to build a matching execution of the 
symbolic rule machine. There is just one last case to consider, namely when the execution ends with a fault 
into kernel mode and never returns to user mode. However, no output is produced in this case, guaranteeing 
that the full trace is matched. We thus derive the following refinement via states, of which Theorem 9.4 is 
a corollary. 
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Lemma 9.7. The pair >g) defines a refinement via states between the symbolic rule machine and the 
concrete machine. 


9.3 Concrete machine refines abstract machine 

By composing the refinement of Lemma 9.2 and the refinement of Theorem 9.4 instantiated to the concrete 
machine running (/) 7 ^abs, we can conclude that the concrete machine refines the abstract one: 

Theorem 9.8. The concrete IFC machine refines the abstract IFC machine via (i>g, >“). 

9.4 Abstract machine refines concrete machine 

The previous refinement, (i> 5 ,i>g), would also hold if the fault handler never returned when called. So, 
to ensure the concrete machine reflects the behaviors of the abstract machine, we next prove an inverse 
refinement: 

Theorem 9.9. The abstract IFC machine refines the concrete IFC machine via >7'^), where and 
are the relational inverses of and 

This guarantees that traces of the abstract machine are also emitted by the concrete machine. As above 
we use the symbolic rule machine as an intermediate step and show a state refinement of the concrete into 
the symbolic rule machine. We rely on the following lemma. 

Lemma 9.10 (Forward refinement). Let qso and csq be two states with csg i>J° gsg- Suppose that the 
symbolic rule machine takes a step qsg qsi. Then there exist concrete state csi and action ac such that 

cso csi, with csi gsi and etc aa- 


T T T Otc 

c5o- ^ • ->•••- > • -^ CSl 



QSo -T-> qsi 


where i>g ° and i>g denote the inverses of and >“, respectively. 

Proof. Because esg i>f‘^ qsg, the cache is consistent with the symbolic rule table 7Z. If the cache input 
matches the opcode and data of esg, then (because qsg qsi) the cache output must allow a step 
csq CSl as required. On the other hand, if the cache input does not match the opcode and data of 
csq, then a cache fault occurs, loading the cache input and calling the fault handler. By Lemma 7.1 and the 
fact that qsg qsi, the cache output is computed to be consistent with 7Z, and this allows the concrete 
step as claimed. □ 


9.5 Discussion 

The two top-level refinement properties (Theorem 9.4 and Theorem 9.9) share the same notion of matching 
relations but they have been proved independently in our Coq development. In the context of compiler 
verification [57, 81], another proof methodology has been favored: a backward simulation proof can be 
obtained from a proof of forward simulation under the assumption that the lower-level machine is deter¬ 
ministic. (CompCertTSO [81] also requires a receptiveness hypothesis that trivially holds in our context.) 
Since our concrete machine is deterministic, we could apply a similar technique. However, unlike in com¬ 
piler verification where it is common to assume that the source program has a well-defined semantics (i.e. 
it does not get stuck), we would have to consider the possibility that the high-level semantics (the sym¬ 
bolic rule machine) might block and prove that in this case either the IFC enforcement judgment is stuck 
(and Lemma 9.6 applies) or the current symbolic rule machine state and matching concrete state are both 
ill-formed. 
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10 Noninterference 


In this section we define TINI [2, 43] for generic machines (recall Definition 8.1), and present a set of 
unwinding conditions [37] sufficient to guarantee TINI for a generic machine (Theorem 10.3); we show 
that the abstract machine of Section 3 satisfies these unwinding conditions and thus satisfies TINI (Theo¬ 
rem 10.5), that TINI is preserved by refinement (Theorem 10.6), and finally, using the fact that the con¬ 
crete IFC machine refines the abstract one (Theorem 9.4), that the concrete machine satisfies TINI (Theo¬ 
rem 10.8). 


Termination-insensitive noninterference (TINI) To define noninterference, we need to talk about what 
can be observed about the output trace produced by a run of a machine. 

Definition 10.1 (Observation). A notion of observation for a generic machine is a 3-tuple (O, [•]., •«*•). O 
is a set of observers (i.e., different degrees of power to observe), ranged over by o. For each o G ^1, [■ \ o Q E 
is a predicate of observability of events for observer o, and • • C / x J is a relation of indistinguishability 

of input data for observer o. 

The predicate [ej o is used to filter unobservable events from traces (written [fj <,): 


[ejo = e 

f e.[t\o if[e\o 
[e.tjo = < 

I [fj o otherwise 

Also a notion of indistinguishability of traces (written ti t 2 ) is defined inductively: 

ti t2 


t 


t 


e.ti 


e.t2 


(3) 


This definition truncates the longer trace to the same length as the shorter and then demands that the re¬ 
maining elements be pairwise identical. 

Definition 10.2 (TINI). A machine {S, Init) with a notion of observation (O, [•].,•«*•) satisfies 

TINI if, for any observer o € 12, pair of indistinguishable initial data ii Z 2 , and pair of executions 

and Init{i 2 ) we have [<iJo L^ 2 jo- 


Notice that the input data for our machines includes the program to be executed; hence, we can apply 
the definition above to the execution of different programs. The reason for calling this notion “termination 
insensitive” is that, because of truncated traces in (3), we only model the case where we distinguish two runs 
of the same program by observing two distinguishable events that occur on the same position. Hence, this 
definition does not attempt to protect against attackers that try to learn a secret by seeing whether a program 
terminates or not: our observers cannot distinguish between successful termination, failure with an error, 
or entering an infinite loop with no observable output. This TINI property is standard for a machine with 
output [2, 43]f 


Unwinding conditions Having defined TINI for generic notions of machine and observation, we now 
explain a sufficient set of conditions for such a machine to have the TINI property and sketch a proof of 
TINI from these conditions. The proof technique is standard [37]. 

A silent action cannot be observed, so we extend the given predicate [eJ o to actions by stating that 
[rjo never holds. From this we inductively define a notion of indistinguishability of actions to observer o 
(written ai a 2 ): 

_ ^[ai\o ^L«2jo 

Of «[] a Q!i a2 

Two actions are indistinguishable to o if either they are equal, or if neither can be observed by o. 

^It is called “progress-insensitive noninterference” in a recent survey [43]. We have stated it for inductively defined executions 
and traces (1), which is all we need in this paper, but it can easily be lifted to coinductive executions and traces: not only successfully 
terminating and finitely failing executions, but also infinite executions. This holds because TINI is a 2-safety hyperproperty [22]; a 
formal proof of this can be found in our Coq development. 
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Theorem 10.3. A machine {S, Init) with notion of observation (fl, •) satisfies TINI if, 

for each o € O, there exist two relations, indistinguishability of states to observer o (written si S 2 ) and 
observability of states to observer o (written [sj o), satisfying four sanity conditions 

ii «o *2 ^ Init(ix) Init{i 2 ) (5) 

Si «o S2 S2 Si (6) 

Si S 2 ^ ([sijo L'S2jo) (7) 

([aJoAs—[sjo (8) 

and three unwinding conditions, assuming si S 2 and si s'l: 

([sij o A S2 S2) (cti a 2 A ss® S2) ( 9 ) 

(^[sijo A ^[s'ljo) s'l S2 ( 10 ) 

(^[sijo A [s'ljo A [S2J0 A S2 S2) ^ si 4 ( 11 ) 


We outline the proof, which motivates each of the sanity and unwinding conditions. To prove TINI we 
must consider pairs of traces of machine evaluations starting from initial states Init{ii) and Init{i 2 ) and 
show that, after filtering for observability, these pairs of traces are indistinguishable. For the proof, we also 
maintain the invariant that the pairs of states reached by the two evaluations are indistinguishable. We are 
given that ii Kf ^ 2 , so by (5) the initial states are indistinguishable, as are the traces emitted so far (namely 
e). 

Now suppose the two evaluations have arrived at two indistinguishable states, si ss® S 2 , and that the 
filtered traces emitted so far are indistinguishable. If si can take a step, si s'j^, what is possible for 
steps from S 2 ? (We may assume that S 2 sf. if no step is possible from S 2 then we are already done 
because (3), used in the definition of TINI, truncates the trace from si at this point.) Proceed by cases on 
observability of si. 

Condition (9) says that, if [sij o, then the new states, s'l and s' 2 , and the emitted traces remain indistin¬ 
guishable. 

On the other hand, suppose proceed by cases on observability of 4- (10) says that, if 

then «® S 2 ; and by (8), since si is unobservable, ai must be unobservable, so the filtered emitted traces 
remain indistinguishable. 

Finally, the case where ^ [sij o and [s4 o- Then ^ [S 2 J o (by (7)), and ai and a 2 are both unobservable 
by (8). Consider cases on observability of S 2 - The filtered traces emitted up to and S 2 are indistinguish¬ 
able, and if [sy „ we are done by (11). If ^ [sy o, we are in a case symmetric to the paragraph above; by (6) 
and (10) we have si ss® sy and again the filtered traces emitted up to these points are indistinguishable. □ 

TINI for abstract IFC machine We now instantiate Theorem 10.3 with the abstract machine defined in 
Section 3, showing it satisfies TINI for the following notion of observation: 

Definition 10.4 (Observation for abstract machine). Let £ be a lattice, with partial order <. For the abstract 
machine, events n@L are atoms; we define indistinguishability of atoms, oi 02 , as in (4) above. The 
notion of observation for the abstract machine is (£, [•]“, • •), where 

= L<o 

(p, argsi, n, L) (p, args 2 ,n,L) = argsi [«““] argsz ■ 

(On the right-hand side of the second equation, [«““] is indistinguishability of atoms, lifted to lists as in 
Definition 8.2.) 

To instantiate Theorem 10.3 we must exhibit relations of observability and indistinguishability on states. 
We outline these definitions and the proofs of the sanity and unwinding conditions here. 
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A state s = (/i [a] pc) of the abstract machine is observable by observer o & C, written [sj whenever 
pc = n@Lpc is itself observable, i.e., Lpc < o. 

Indistinguishability of states is defined by two clauses: the first for observable states (left), and the other 
for non-observable ones (right). 


[pc\o ^bcijo -^[pC2\o 

cTi bri ^2 Ml [^r] cTi a2 Ml [^ri ^2 

Ml bl] pc p ,2 [ 0 - 2 ] pc Hi [ai] pCi IJ ,2 W 2 ] PC 2 


Here we abuse the notation of lifting, [«““], using it for memories and stacks (two stack elements are 
indistinguishable if they are indistinguishable atoms, or are both return stack frames, with indistinguishable 
return addresses). 

Let’s have a more detailed look at the definition of state indistinguishability. For observable states, 
we simply require that all the state components be indistinguishable. For non-observable ones, however, 
we must make the relation more permissive. Indeed, the abstract IFC machine steps from an observable 
state to a non-observable state when, e.g., branching on the value of a secret. When that happens, the tight 
correspondence on states no longer holds. Depending on the value of a secret, the machine could, e.g., 
jump to different instruction addresses, put different numbers of values on its stack, perform more or fewer 
function calls, etc. Because of that, we must allow states with different pc values to be related, and adopt 
a weaker indistinguishability relation on stacks. This new relation, noted only is used when relating 
unobservable states, and intuitively says that the stacks of such states only need to be related up to the most 
recent return frame to an observable one. Formally, ai CT2 is defined as [ctiJ o [~“] [*^2] o, where: 


L[]J 


[n@L, crjo 


_A 

A 


[] 

kJo 


[n@L; crJo = 


n@L; a 

Wo 


if L < o 
otherwise 


In this way we relax the correspondence between call stacks of two machines, while at the same time 
keeping the invariant that holds on the “observable” part of the stacks, which we will need when proving 
Equation 11 for the abstract machine. 

Theorem 10.5. The relations [ J “ and • • satisfy the sanity and unwinding conditions of Theorem 10.3; 

thus, the abstract IFC machine has TINT 


Proof. Most sanity conditions are easy consequences of the definitions, and do not require detailed expla¬ 
nation. We give an overview of the most interesting aspects of the proof; a more detailed account can be 
found in the formal development. 

The Output instruction plays an important role for condition (8) and for the first conclusion of (9). 
Crucially, since that instruction joins the label of the current pc to the output atom, an unobservable state 
necessarily produces an unobservable action. Further, when two low states are indistinguishable and step 
(i.e., when they satisfy the preconditions of (9)), the atoms on top of the stack must be indistinguishable, 
leading to indistinguishable output actions. 

As for the second conclusion of (9), since indistinguishable low states have equal pc values, they ex¬ 
ecute the same instructions. Thus, showing that the states remain indistinguishable after stepping is just a 
matter of reasoning about the values that are used by each instruction on both states. These values must be 
indistinguishable, and it is easy to show that storing them at the same locations in indistinguishable stacks 
and memories leads to stacks and memories that are still indistinguishable. 

Most of the cases of condition (10)—stepping from an unobservable state to another unobservable 
state—are trivial, since they only manipulate values or unobservable return frames on top of the stack (which 
by construction are irrelevant when checking whether the corresponding stacks are indistinguishable). The 
only exception is the Store instruction, which also modifies the memory. Since the label on the pc is 
assumed to be above the level of the observer, the side condition of that instruction ensures that the same 
holds of the memory position being updated. This ensures that both memories remain indistinguishable, 
since the other positions are not affected. 


27 



Finally, the precondition of ( 11 ) (stepping from unobservable to observable states) only applies when 
both states execute matching Ret instructions. Since we assume that the resulting states are both observable, 
we conclude that the top of the original stacks contained the same observable return frame. The definition 
of indistinguishability says that the portions of the stacks below that frame are indistinguishable. Since 
those are exactly the values of the new stacks, and the returning pc is the same on both states, we conclude 
that the resulting observable states are indistinguishable. □ 

TINI preserved by refinement 

Theorem 10.6 (TINI preservation). Suppose that the generic machine M2 refines Mi by refinement (i>i, i>e) 
and that each machine is equipped with a notion of observation. Suppose that, for all observers 02 of M2, 
there exists an observer oi of Mi such that the following compatibility conditions hold : 

1 . for all Cl e F^i and 62 e i?2, ei Oe 62 ^ ([eijoi Le2jo2) 

2 . foralH2,t2 G/2, *2^02*2 ^ & h -*i *2 A t'l 13 

3 . for all Cl, e'l G Ei, and all 62, e'2 G E2, (ei e'l A ei (>e 62 A e'l [>e 62) ^ 62 ®2 

Then, if Mi has TINI, M2 also has TINI. 

Proof. We include a brief proof sketch to convey the meaning of the theorem and the role of the compati¬ 
bility conditions; intuitively, they say that oi does not have more observation power than 02. Suppose that 
02 observes two traces and t'2, starting from initial states 12 and We want to show that both traces 
are indistinguishable whenever the initial states are. By condition 2 , we can find related initial states ii 
and z'l of Mi that are indistinguishable. Since M2 refines Mi, we know that these initial states produce 
traces ti and t'l that match t2 and tf, furthermore, since Mi has TINI, ti and t'l are indistinguishable. By 
condition 1, filtering related traces results in related traces; that is, [fij oi [>e] 02^ similarly for the 

other two traces. This implies, thanks to condition 3 , that we can use the indistinguishability of [fijoi and 
[f'lJ oi to argue that [f2j 02 and [ty are also indistinguishable, by a simple induction on the traces. □ 

Some formulations of noninterference are subject to the refinement paradox [ 48 ], in which refinements 
of a noninterferent system may violate noninterference. We avoid this issue by employing a strong notion 
of noninterference that restricts the amount of non-determinism in the system and is thus preserved by any 
refinement (Theorem 10 . 6 ).® Since our abstract machine is deterministic, it is easy to show this strong 
notion of noninterference for it. In Section 13 we discuss a possible technique for generalizing to the 
concurrent setting while preserving a high degree of determinism. 

TINI for concrete machine with IFC fault handler It remains to define a notion of observation on the 
concrete machine, instantiating the definition of TINI for this machine. This definition refers to a concrete 
lattice CL, which must be a correct encoding of an abstract lattice £; the lattice operators genBot, genJoin, 
and gen Flows must satisfy the specifications in Section 7 . 

Definition 10.7 (Observation for the concrete machine). Let C be an abstract lattice, and CL be correct 
with respect to C. The observation for the concrete machine is (£, •), where 

= Lab(T) < o, 

{p,argsfn,r)Kil^{p,args2,n,T) = argsi args2, 

and args[ = map {X{n@L). n@Tag{L)) argSi. 

Finally, we prove that the backward refinement proved in Section 9 (Theorem 9 . 8 ) satisfies the compat¬ 
ibility constraints of Theorem 10 . 6 , so we derive the main result: 

Theorem 10.8. The concrete IFC machine running the fault handler fijiabs satisfies TINI. 
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instr 


extensions to instruction set 


Alloc 

allocate a new frame 

SizeOf 

fetch frame size 

Eq 

value equality 

SysCall id 

system call 

GetOff 

extract pointer offset 

Pack 

atom from payload and tag 

Unpack 

atom into payload and tag 

PushGachePtr 

push cache address on stack 

Dupzz 

duplicate atom on stack 

Swapzz 

swap two data atoms on stack 


Figure 11 : Additional instructions for extensions 


t(n) = Alloc alloc k {LVLpc) a = {id, fi') 

[(Int a, ct] n@Lpc ^ 
fi' [(Ptr (zd, 0 ))@L, cr] {n+l)@Lpc 

i{n) = SizeOf length {^{id)) = k 
/i [(Ptr (zd, o))@fy, cr] n@Lpc fi [(lntfc)@L, ct] ( 7 Z+ 1 )@-Lpc 
L{n) = GetOff 

IJ [{Ptr {id, o))@L, a] n@Lpc /i [(Int o)@L, cr] ( 7 z+l)@Lpc 
z(zz) = Eq 

/z [vi@Li,V2@L2, a] n@Lpc - 4 - 

/z [(lnt(z;i == v2))®{LiyL2),<j] {n+l)@Lpc 

L{n) = SysCall id T{id) = {k, f) 
f{<Ti) = v@L length (cti) = k 

^ [(Ji++a2] n@Lpc A /z [v@L,a2] {n+l)@Lpc 

Figure 12 : Semantics of selected new abstract machine instructions 
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(.(n) = Alloc 
li{cache) = 


alloc ku a fj, = {id, /i') 


alloc 

T 

^pc 

Ti 

Td 

Td 

T 

rpc 

T 


U /i [(Int fc)@Ti, o, cr] n@Tpc 

u /i' [(Ptr (id, 0 ))(aTr, ct] (n+l)@Trpc 


(/()(?z) = Alloc alloc A: k o /i = (id, /i') 

k jjL [(Int fc)@_, o, cr] n @_-4 

k/x' [(Ptr (id, 0 ))@Td, ct] (n+l)@TD 


(f>{n) = PushCachePtr 

k /i [a] n@_ A k /i [(Ptr {cache, 0))@Td, a] (n+l)@Ti) 
(f>{n) = Unpack 

k /i [wi@'?;27 cr] n@_ -4 k /i [u2®Td, ui@Td, ct] (n+l)@TD 

(()(n) = Pack 

k fjL [f2@_, O'] n@_ -4 k /X [fi@V2, o] (n+l)@TD 

i(n) = SysCall id T{id) = {k,n') length (cti) = A: 

U /X [cri^o’2] n@T ^ k /X [fTi-H-(n+l@T, u); 02] n'QTo 


Figure 13: Semantics of selected new concrete machine instructions 


11 An Extended System 

Thus far we have presented our methodology in the context of a simple machine architecture and IFC 
discipline. We now show how it can be scaled up to a significantly more sophisticated setting, where the 
basic machine is extended with s. frame-based memory model supporting dynamic allocation and a system 
call mechanism for adding special-purpose primitives. Building on these features, we define an abstract 
IFC machine that uses sets of principals as its labels and a corresponding concrete machine implementation 
where tags are pointers to dynamically allocated representations of these sets. While still much less complex 
than the real SAFE system, this extended model serves as good evidence of the robustness our approach, and 
how it might apply to more realistic designs: The new features were added by incrementally adapting the 
Coq formalization of the basic system, without requiring any major changes to the initial proof architecture. 

Figure 11 shows the new instructions supported by the extended model. Instruction PushCachePtr, 
Unpack, and Pack are used only by the concrete machine, for the compiled fault handler (hence they only 
have a kernel-mode stepping rule; they simply get stuck if executed outside kernel mode, or on an abstract 
machine). We also add two stack-manipulation instructions, Dup and Swap, to make programming the 
kernel routines more convenient. It remains true that any program for the abstract machine makes sense to 
run on the abstract rule machine and the concrete machine. For brevity, we detail stepping rules only for 
the extended abstract IFC machine (Figure 12) and concrete machine (Figure 13); corresponding extensions 
to the symbolic IFC rule machine are straightforward (we also omit rules for Dup and Swap). Individual 
rules are explained below. 

11.1 Dynamic memory allocation 

High-level programming languages usually assume a structured memory model, in which independently 
allocated frames are disjoint by construction and programs cannot depend on the relative placement of 
frames in memory. The SAFE hardware enforces this abstraction by attaching explicit runtime types to all 
values, distinguishing pointers from other data. Only data marked as pointers can be used to access memory. 
To obtain a pointer, one must either call the (privileged) memory manager to allocate a fresh/rame or else 
offset an existing pointer. In particular, it is not possible to “forge” a pointer from an integer. Each pointer 

^The recent noninterference proof for the seL4 microkernel [65, 66] works similarly (see Section 12). 
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also carries information about its base and bounds, and the hardware prevents it from being used to access 
memory outside of its frame. 

Frame-based memory model In our extended system, we model the user-level view of SAFE’S memory 
system by adding a frame-structured memory (similar to [58]), distinguished pointers (so values, the pay- 
load field of atoms and the tag field of concrete atoms, can now either be an integer (Int n) or a pointer 
(Ptrp)), and an allocation instruction to our basic machines. We do this (nearly) uniformly at all levels 
of abstraction.® A pointer is a pair p = {id, o) of a frame identifier id and an offset o into that frame. In 
the machine state, the data memory /r is a partial function from pointers to individual storage cells that is 
undefined on out-of-frame pointers. By abuse of notation, p is also a partial function from frame identifiers 
to frames, which are just lists of atoms. 

The most important new rule of the extended abstract machine is Alloc (Figure 12). In this machine 
there is a separate memory region (assumed infinite) corresponding to each label. The auxiliary function 
alloc in the rule for Alloc takes a size k, the label (region) at which to allocate, and a default atom a; it 
extends p, with a fresh frame of size k, initializing its contents to a. It returns the id of the new frame and 
the extended memory p'. 

IFC and memory allocation We require that the frame identifiers produced by allocation at one label 
not be affected by allocations at other labels; e.g., alloc might allocate sequentially in each region. Thus, 
indistinguishability of low atoms is just syntactic equality, preserving Definition 10.4 from the simple ab¬ 
stract machine, which is convenient for proving noninterference, as we explain below. We allow a program 
to observe frame sizes using a new SizeOf instruction, which requires tainting the result of Alloc with L, 
the label of the size argument. There are also new instructions Eq, for comparing two values (including 
pointers) for equality, and GetOff, for extracting the offset field of a pointer into an integer. However, frame 
ids are intuitively abstract: the concrete representation of frame ids is not accessible, and pointers cannot 
be forged or output. The extended concrete machine stepping rules for these new instructions are analogous 
to the abstract machine rules, with the important exception of Alloc, which is discussed below. 

A few small modifications to existing instructions in the basic machine (Figure 2) are needed to handle 
pointers properly. In particular: (i) Load and Store require pointer arguments and get stuck if the pointer’s 
offset is out of range for its frame, (ii) Add takes either two integers or an integer and a pointer, where 
Int n + Int m = Int(n-l-m) and Ptr {id, oi) -f Int02 = Ptr (id, 01-I-02). (iii) Output works only on 
integers, not pointers. Analogous modifications are needed in the concrete machine semantic rules. 

Concrete allocator The extended concrete machine’s semantics for Alloc differ from those of the abstract 
machine in one key respect. Using one region per tag would not be a realistic strategy for a concrete 
implementation; e.g., the number of different tags might be extremely large. Instead, we use a single region 
for all user-mode allocations at the concrete level. We also collapse the separate user and kernel memories 
from the basic concrete machine into a single memory. Since we still want to be able to distinguish user and 
kernel frames, we mark each frame with a privilege mode (i.e., we use two allocation regions). Figure 13 
shows the corresponding concrete stepping rule for Alloc for two cases: non-faulting user mode and kernel 
mode. The rule cache is now just a distinguished kernel frame cache', to access it, the fault handler uses 
the (privileged) PushCachePtr instruction. The concrete Load and Store rules are modified to prevent 
dereferencing kernel pointers in user mode. These checks are only needed if we want to allow user-level 
code to manipulate kernel pointers directly while protecting the data structures they point to. For instance, 
we could allow certain operations on pointers representing labels, such as taking the join of two labels, while 
preserving noninterference. If kernel pointers cannot be “leaked” into user data (as in subsection 11.3), these 
checks can be safely omitted, since user-level code won’t be able to tamper with kernel data. 

Proof by refinement As before, we prove noninterference for the concrete machine by combining a proof 
of noninterference of the abstract machine with a two-stage proof that the concrete machine refines the ab¬ 
stract machine. By using this approach we avoid some well-known difficulties in proving noninterference 

®It would be interesting to describe an implementation of the memory manager in a still-lower-level concrete machine with no 
built-in Alloc instruction, but we leave this as future work. 
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directly for the concrete machine. In particular, when frames allocated in low and high contexts share the 
same region, allocations in high contexts can cause variations in the precise pointer values returned for 
allocations in low contexts, and these variations must be taken into account when defining the indistin- 
guishability relation. For example, Banerjee and Naumann [11] prove noninterference by parameterizing 
their indistinguishability relation with a partial bijection that keeps track of indistinguishable memory ad¬ 
dresses. Our approach, by contrast, defines pointer indistinguishability only at the abstract level, where 
indistinguishable low pointers are identical. This proof strategy still requires relating memory addresses 
when showing refinement, but this relation does not appear in the noninterference proof at the abstract 
level. The refinement proof itself uses a simplified form of memory injections [58]. The differences in the 
memory region structure of both machines are significant, but invisible to programs, since no information 
about frame ids is revealed to programs beyond what can be obtained by comparing pointers for equality. 
This restriction allows the refinement proof to go through straightforwardly. 

11.2 System calls 

To support the implementation of policy-specific primitives on top of the concrete machine, we provide a 
new system call instruction. The SysCall id instruction is parameterized by a system call identifier. The 
step relation of each machine is now parameterized by a table T that maps system call identifiers to their 
implementations. 

In the abstract and symbolic rule machines, a system call implementation is an arbitrary Coq function 
that removes a list of atoms from the top of the stack and either puts a result on top of the stack or fails, 
halting the machine. The system call implementation is responsible for computing the label of the result 
and performing any checks that are needed to ensure noninterference. 

In the concrete machine, system calls are implemented by kernel routines and the call table contains the 
entry points of these routines in the kernel instruction memory. Executing a system call involves inserting 
the return address on the stack (underneath the call arguments) and jumping to the corresponding entry 
point. The kernel code terminates either by returning a result to the user program or by halting the machine. 

This feature has no major impact on the proofs of noninterference and refinement. For noninterference, 
we must show that all the abstract system calls preserve indistinguishability of abstract machine states; 
for refinement, we show that each concrete system call correctly implements the abstract one using the 
machinery of Section 7. 

11.3 Labeling with sets of principals 

The full SAFE machine supports dynamic creation of security principals. In the extended model, we make 
a first step toward dynamic principal creation by taking principals to be integers and instantiating the (para¬ 
metric) lattice of labels with the lattice of finite sets of integers. This lattice is statically known, but models 
dynamic creation by supporting unbounded labels and having no top element. In this lattice, _L is 0 , V is U, 
and < is C. We enrich our IFC model by adding a new classification primitive joinP that adds a principal 
to an atom’s label, encoded using the system call mechanism described above. The operation of joinP is 
given by the following derived rule, which is an instance of the SysCall rule from Figure 12. 

t(n) = SysCalljoinP 

fi [v@Li, {\nt m)@L 2 , a] n@Lpc ^ /r [u@(LiVT2V{m}), cr] (n-|-l)@ipc 

At the concrete level, a tag is now a pointer to an array of principals (integers) stored in kernel memory. 
To keep the fault handler code simple, we do not maintain canonical representations of sets: one set may 
be represented by different arrays, and a given array may have duplicate elements. (As a consequence, 
the mapping from abstract labels to tags is no longer a function; we return to this point below.) Since the 
fault handler generator in the basic system is parametric in the underlying lattice, it doesn’t require any 
modification. All we must do is provide concrete implementations for the appropriate lattice operations: 
genJoin just allocates a fresh array and concatenates both argument arrays into it; genFlows checks for array 
inclusion by iterating through one array and testing whether each element appears in the other; and genBot 
allocates a new empty array. Finally, we provide kernel code to implement joinP, which requires two new 
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privileged instructions, Pack and Unpack (Figure 13), to manipulate the payload and tag fields of atoms; 
otherwise, the implementation is similar to that of gen Join. 

A more realistic system would keep canonical representations of sets and avoid unnecessary allocation 
in order to improve its memory footprint and tag cache usage. But even with the present simplistic approach, 
both the code for the lattice operations and their proofs of correctness are significantly more elaborate than 
for the trivial two-point lattice. In particular, we need an additional code generator to build counted loops, 
e.g., for computing the join of two tags. 

genFor c = [Dup] -h- genlf (genLoop(c -i^ [Push (—1), Add])) [] 

where genLoop c = c ^ [Dup, Bnz (—(length c + 1))] 

Here, c is a code sequence representing the loop body, which is expected to preserve an index value on top 
of the stack; the generator builds code to execute that body repeatedly, decrementing the index each time 
until it reaches 0. The corresponding specification is 

P„(k, ct) := 3 T cr'. cr = n@T, a' A Inv{K, a) 

Qnin, a) := 3 T cr'. cr = n@T, cr' A V T'. Inv^n, {{n — 1)@T', cr')) 

V n. 0 < n {Pn} c {Qn} 

P{k, cr) := 3 n T ct'. 0 < n a cr = n@T, cr' A Inv^K, cr) 

Q{K,a) := 3Tct'. CT = 0@T, ct' A Inv{K,a) 

{P} genFor c {Q} 

To avoid reasoning about memory updates as far as possible, we code in a style where all local context 
is stored on the stack and manipulated using Dup and Swap. Although the resulting code is lengthy, it is 
relatively easy to automate the corresponding proofs. 

Stateful encoding of labels Changing the representation of tags from integers to pointers requires modi¬ 
fying one small part of the basic system proof. Recall that in Section 6 we described the encoding of labels 
into tags as a pure function Lab. To deal with the memory-dependent and non-canonical representation of 
sets described above, the extended system instead uses a relation between an abstract label, a concrete tag 
that encodes it, and a memory in which this tag should be interpreted. 

If tags are pointers to data structures, it is crucial that these data structures remain intact as long as 
the tags appear in the machine state. We guarantee this by maintaining the very strong invariant that each 
execution of the fault handler only allocates new frames, and never modifies the contents of existing ones, 
except for the cache frame (which tags never point into). A more realistic implementation might use 
mutable kernel memory for other purposes and garbage collect unused tags; this would require a more 
complicated memory invariant. 

The TINI formulation is similar in essence to the one in Section 10, but some subtleties arise for con¬ 
crete output events, since their tags cannot be interpreted on their own anymore. We wish to (i) keep the 
semantics of the concrete machine independent of high-level policies such as IFC and (ii) give a statement 
of noninterference that does not refer to pointers. To achieve these seemingly contradictory aims, we model 
an event of the concrete machine as a pair of a concrete atom plus the whole state of the kernel memory. 
This memory is not visible to observers in the formulation of TINI, but instead determines which events’ 
payloads they are able to observe. This is done by extending our notion of observation with a function that 
interprets every concrete event present in the output trace in higher-level terms. This interpretation abstracts 
away from low-level representation issues, such as the layout of data structures in memory, and allows us to 
give a more natural definition of event indistinguishability in the formulation of TINI. For instance, in the 
extended system described above, the interpretation of a pointer tag is the set of principals that that pointer 
represents in kernel memory—that is, the contents of the array it points to. This allows us to define the 
event indistinguishability relation by simple equality. 

Our model of observation in terms of an interpretation function is an idealization of what happens in the 
real SAFE machine, where communication of labeled data with the outside world involves cryptography. 
Extending this model this is left as future work. 
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12 Related Work 


The SAFE design spans a number of research areas, and a comprehensive overview of related work would 
be huge. We focus here on a small set of especially relevant points of comparison. 

Language-based IFC Static approaches to IFC have generally dominated language-based security re¬ 
search [69, 73, 77, 93]; however, statically enforcing IFC at the lowest level of a real system is chal¬ 
lenging. Soundly analyzing native binaries with reasonable precision is hard (static IFC for low-level 
code usually stops at the bytecode level [13, 38, 42, 59]), even more so without the compiler’s coop¬ 
eration (e.g., for stripped or obfuscated binaries). Proof-carrying code [12, 13, 38] and typed assembly 
language [61, 94, 95] have been used for enforcing IFC on low-level code without low-level analysis or 
adding the compiler to the TCB. In SAFE [29, 34] we follow a different approach, enforcing noninterfer¬ 
ence using purely dynamic checks, for arbitrary binaries in a custom-designed instruction set. The mech¬ 
anisms we use for this are similar to those found in recent work on purely dynamic IFC for high-level 
languages [1, 4, 5, 6, 7, 40, 41, 44, 45, 63, 72, 75, 78, 83, 86]; however, as far as we know, we are the first 
to push these ideas to the lowest level. 

seL4 Murray et al. [66] recently demonstrated a machine-checked noninterference proof for the imple¬ 
mentation of the seL4 microkernel. This proof is carried out by refinement and reuses the specification and 
most of the existing functional correctness proof of seL4 [53]. Like the TINI property in this paper, the 
variant of intransitive noninterference used by Murray et al. is preserved by refinement because it implies a 
high degree of determinism [65]. This organization of their proof was responsible for a significant saving in 
effort, even when factoring in the additional work required to remove all observable non-determinism from 
the seL4 specification. Beyond these similarities, SAFE and seL4 rely on completely different mechanisms 
to achieve different notions of noninterference (seL4 admits intransitive IFC policies, capturing the “where” 
dimension of declassification [79], while we consider transitive ones). Whereas, in SAFE, each word of 
data has an IFC label and labels are propagated on each instruction, the seL4 kernel maintains separation 
between several large partitions (e.g., one partition can run an unmodified version of Linux) and ensures 
that information is conveyed between such partitions only in accordance with a fixed access control policy. 

PROSPER In parallel work. Dam et al. [27, 28, 52] verified information flow security for a tiny proof- 
of-concept separation kernel running on ARMv7 and using a Memory Management Unit for physical pro¬ 
tection of memory regions belonging to different partitions. The authors argue that noninterference is not 
well suited for systems in which components are supposed to communicate with each other. Instead, they 
use the bisimulation proof method to show trace equivalence between the real system and an ideal top-level 
specification that is secure by construction. As in seL4 [66], the proof methodology precludes an abstract 
treatment of scheduling, but the authors contend this is to be expected when information flow is to be taken 
into account. In more recent work, Balliu et al. [10] propose a symbolic execution-based information flow 
analysis for machine code, and use this technique to verify a separation kernel system call handler, a UART 
device driver, and a crypto service modular exponentiation routine. 

TIARA and ARIES The SAFE architecture embodies a number of innovations from earlier paper de¬ 
signs. In particular, the TIARA design [84] first proposed the idea of a zero-kernel operating system and 
sketched a concrete architecture, while the ARIES project proposed using a hardware rule cache to speed 
up information-flow tracking [16]. In TIARA and ARIES, tags had a fixed set of fields and were of limited 
length, whereas, in SAFE, tags are pointers to arbitrary data structures, allowing them to represent complex 
IFC labels encoding sophisticated security policies [62], for instance decentralized ones [69, 85]. More¬ 
over, unlike TIARA and ARIES, which made no formal soundness claims, SAFE proposes a set of IFC 
rules aimed at achieving noninterference; the proof we present in this paper, though for a simplified model, 
provides evidence that this goal is feasible. 

RIFLE and other binary-rewriting-based IFC systems RIFLE [91 ] enforces user-specified information- 
flow policies for x86 binaries using binary rewriting, static analysis, and augmented hardware. Binary 
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rewriting is used to make implicit flows explicit; it heavily relies on static analysis for reconstructing the 
program’s control-flow graph and performing reaching-definitions and alias analysis. The augmented hard¬ 
ware architecture associates labels with registers and memory and updates these labels on each instruction 
to track explicit flows. Additional security registers are used by the binary translation mechanism to help 
track implicit flows. Beringer [14] recently proved (in Coq) that the main ideas in RIFLE can be used 
to achieve noninterference for a simple While language. Unlike RIFLE, SAFE achieves noninterference 
purely dynamically and does not rely on binary rewriting or heroic static analysis of binaries. Moreover, 
the SAFE hardware is generic, simply caching instances of software-managed rules. 

While many other information flow tracking systems based on binary rewriting have been proposed, few 
are concerned with soundly handling implicit flows [23, 60], and even these do so only to the extent they 
can statically analyze binaries. Since, unlike RIFLE (and SAFE), these systems use unmodified hardware, 
the overhead for tracking implicit flows can be large. To reduce this overhead, recent systems track implicit 
flows selectively [51] or not at all [49, 74]—arguably a reasonable tradeoff in settings such as malware 
analysis or attack detection, where speed and precision are more important than soundness. 

Hardware taint tracking The last decade has seen significant progress in specialized hardware for accel¬ 
erating taint tracking [18, 25, 26, 31, 32, 89, 92]. Most commonly, a single tag bit is associated with each 
word to specify if it is tainted or not. Initially aimed at mitigating low-level memory corruption attacks by 
preventing the use of tainted pointers and the execution of tainted instructions [18, 25, 89], hardware-based 
taint tracking has also been used to prevent high-level attacks such as SQL injection and cross-site script¬ 
ing [26]. In contrast to SAFE, these systems prioritize efficiency and overall helpfulness over the soundness 
of the analysis, striking a heuristic balance between false positives and false negatives (missed attacks). As 
a consequence, these systems ignore implicit flows and often do not even track all explicit flows. While 
early systems supported a single hard-coded taint propagation policy, recent ones allow the policy to be de¬ 
fined in software [26, 31, 92] and support monitoring policies that go beyond taint tracking [19, 31, 32, 76]. 
Harmoni [31], for example, provides a pair of caches that are quite similar to the SAFE rule cache. Possibly 
these could even be adapted to enforcing noninterference, in which case we expect the proof methodology 
introduced here to apply. 

Timing and termination Our TINI property ignores both termination and timing: a program that di¬ 
verges, fails, or takes varying amounts of time to run based on a sensitive input is considered secure. The 
full SAFE design includes a clearance-based access-control mechanism [86] for addressing termination and 
timing covert channels (i.e., high-bandwidth channels through which malicious code can exfiltrate secrets 
it directly has access to). Stefan et al. [87] have also shown that in a concurrent setting such leaks can 
be prevented by an adapted IFC mechanism, at the risk of spawning very large numbers of threads. We 
believe that this IFC mechanism could also be enforced using the hardware mechanisms we describe here. 
A recently proposed technique for instruction-based scheduling [17, 88] is aimed at preventing leaks via 
the internal timing side-channel (e.g., malicious code sharing the same processor inferring secrets through 
timing variations arising from cache misses). This could probably be adapted to SAFE, and since the SAFE 
processor is very simple the mitigation could work well [24]. Finally, several mechanisms have been pro¬ 
posed for mitigating the external timing side-channel (i.e., leakage of secrets to an attacker making timing 
observations over the network) and thus reducing the rate at which bits can be leaked [3, 98]. We do not 
consider any of these attacks or mitigations in this work. 

Verification of low-level code The distinctive challenge in verifying machine code is coping with un¬ 
structured control flow. Our approach using structured generators to build the fault handler is similar to 
the mechanisms used in Chlipala’s Bedrock system [20, 21] and by Jensen et al. [50], but there are several 
points of difference. These systems each build macros on top of a powerful low-level program logic for ma¬ 
chine code (Ni and Shao’s XCAP [71], in the case of Bedrock), whereas we take a simpler, ad-hoc approach, 
building directly on our stack machine’s relatively high-level semantics. Both these systems are based on 
separation logic, which we can do without since (at least in the present simplified model) we have very few 
memory operations to reason about. We have instead focused on developing a simple Hoare logic specifi¬ 
cally suited to verifying structured runtime-system code; e.g., we omit support for arbitrary code pointers. 
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but add support for reasoning about termination. We use total-correctness Hoare triples (similar to Myreen 
and Gordon [70]) and weakest preconditions to guarantee progress, not just safety, for our handler code. 
Finally, our level of automation is much more modest than Bedrock’s, though still adequate to discharge 
most verification conditions on straight-line stack manipulation code rapidly and often automatically. 

Work on testing noninterference The abstract machine in Section 3 was proposed by Hrijcu et al. [46], 
extended in this work with dynamic allocation and data classification (Section 11), and recently further 
extended by Hrijcu et al. to a sophisticated machine featuring a highly permissive flow-sensitive dynamic 
enforcement mechanism, public labels, and registers [47]. While the focus of that work is on verifying 
noninterference by random testing, it also shows how to use invariants discovered during testing to formalize 
proofs of noninterference in Coq. 

Although the abstract machine and IFC mechanism considered here are simpler than the most complex 
ones of Hrijcu et al. [47], our main concerns are the concrete machine, the IFC fault handler, and the 
key properties of this combination, all of which are novel. We believe nevertheless that our methodology 
could be extended to that setting as well, verifying an implementation of this extended IFC machine by 
a lower-level one. Depending on the hardware capabilities at the lower level, some of the features of the 
machine could have to be implemented in software, requiring further proofs. For instance, this extended 
IFC machine still relies on a protected stack for soundly performing function calls and returns: on a call, 
the entire register file is stored on this stack, so that it can be restored upon a return, thereby preventing data 
leakage. At the lowest level, this protected stack could be implemented with a regular stack living in kernel 
space, managed through special system calls. 

Tagging hardware beyond IFC Although the tagging mechanism we discuss arose in the context of 
the SAFE system, and was primarily designed for information-flow control, it is sufficiently generic to be 
implemented in other architectures and to enforce more security policies. 

In follow-on work, Dhawan et al. [35] adapt the tagging mechanism to a more conventional RISC pro¬ 
cessor, using it to implement policies such as memory safety and control-flow integrity. They evaluate the 
performance of the mechanism on benchmark simulations, which indicate a modest impact on speed (typi¬ 
cally under 10%) and power ceiling (less that 10%), even when enforcing multiple policies simultaneously. 

Azevedo de Amorim et al. [9] use Coq to formalize a generic version of the symbolic machine of Sec¬ 
tion 4; that machine is different from the one discussed here in that it is based on a more conventional 
processor design (e.g., with registers instead of a protected stack), and serves as a high-level substrate for 
programming many different security policies, including compartmentalization and memory safety. Finally, 
they formulate the intended effect of each policy as a security property, using formal proofs to show that 
each policy enforces the corresponding property. 

A recent project at Draper Labs [30] is working to extend the RISC-V processor with tag propagation 
hardware in the style of the SAFE processor. As of March 2016, a prototype able to boot Linux is running 
on LPGA boards. 


13 Conclusions and Future Work 

We have presented a formal model of the key lEC mechanisms of the SALE system: propagating and check¬ 
ing tags to enforce security, using a hardware cache for common-case efficiency and a software fault handler 
for maximum flexibility. To formalize and prove properties at such a low level (including features such as 
dynamic memory allocation and labels represented by pointers to in-memory data structures), we first con¬ 
struct a high-level abstract specification of the system, then refine it in two steps into a realistic concrete 
machine. A bidirectional refinement methodology allows us to prove (i) that the concrete machine, loaded 
with the right fault handler (i.e. correctly implementing the lEC enforcement of the abstract specification) 
satisfies a traditional notion of termination-insensitive noninterference, and (ii) that the concrete machine 
reflects all the behaviors of the abstract specification. Our formalization reflects the programmability of the 
fault handling mechanism, in that the fault handler code is compiled from a rule table written in a small 
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DSL. We set up a custom Hoare logic to specify and verify the corresponding machine code, following the 
structure of a simple compiler for this DSL. 

The development in this paper concerns three deterministic machines and simplifies away concurrency. 
While the lack of concurrency is a significant current limitation that we would like to remove by moving to a 
multithreading single-core model, we still want to maintain the abstraction layers of a proof-by-refinement 
architecture. This requires some care so as not to run afoul of the refinement paradox [48] since some stan¬ 
dard notions of noninterference (for example possibilistic noninterference) are not preserved by refinement 
in the presence of non-determinism. One promising path toward this objective is inspired by the recent 
noninterference proof for seL4 [65, 66]. If we manage to share a common thread scheduler between the 
abstract and concrete machines, we could still prove a strong double refinement property (concrete refines 
abstract and vice versa) and hence preserve a strong notion of noninterference (such as the TINI notion 
from this work) or a possibilistic variation. 

Although this paper focuses on IFC and noninterference, the tagging facilities of the concrete machine 
are completely generic and have been used since to enforce completely different properties like memory 
safety, compartment isolation, and control-flow integrity [9]. Moreover, although the rule cache / fault han¬ 
dler design arose in the context of SAFE, it has since been adapted to a conventional RISC processor [35]. 
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